jecsand838 opened a new pull request, #8123: URL: https://github.com/apache/arrow-rs/pull/8123
# Which issue does this PR close? - Part of https://github.com/apache/arrow-rs/issues/4886 # Rationale for this change This PR introduces an Avro writer implementation to the `arrow-avro` crate, enabling Arrow RecordBatches to be serialized into Avro format. This feature enhances the bidirectional interoperability between Arrow and Avro. # What changes are included in this PR? - Added `Writer`, `WriterBuilder`, and `AvroFormat` abstractions: - Support for **Object Container Files (OCF)**: includes metadata and sync markers for standalone Avro files. - Support for raw **Avro binary streams**: minimal framing for environments like message brokers. - Core encoder (`encoder.rs`) implementation: - Encodes Arrow `RecordBatch` into Avro binary format. - Includes support for primitive, nullable, and complex types (e.g., timestamps, binary, float). - Added support for `CompressionCodec` (e.g., Snappy, Deflate, ZStandard, etc.) for OCF files. - Type-specific encoding: ZigZag variable-length integers, prefixed binary, and null representation. - Added tests to verify behavior, schema validation, and compression functionality: - `test_finish_without_write` ensures a proper header is written even with no data. - `test_ocf_writer_generates_header_and_sync` checks header and sync marker correctness. # Are these changes tested? Yes. The implementation includes unit and integration tests: - Verified schema validation, record writing, sync marker correctness. - Compression-enabled file writing and round-trip validation. - Exhaustive tests for compatibility with Arrow schemas and data types. # Are there any user-facing changes? N/A # Follow-Up PRs - Add Impala Nullability support - Performance optimizations for large batch encoding. - Add remaining types support and round trip tests for encoder. - Implement Avro Binary Stream. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org