LantaoJin opened a new pull request, #60:
URL: https://github.com/apache/datafusion-java/pull/60

   ## Which issue does this PR close?
   
   - Closes #36 .
   
   ## Rationale for this change
   
   DataFusion 53.1 supports Avro via `SessionContext::read_avro` / 
`register_avro`. The Java binding for Avro is still missing.
   
   I measured release builds before and after on the same machine: 
`libdatafusion_jni.so` grew from 146,983,936 bytes to 151,566,800 bytes, **+4.4 
MiB** unstripped. Modest given that Avro is comparable in scope to the 
parquet/csv readers that are already on by default. Always-on means Java 
callers can rely on Avro being present without juggling Cargo features through 
the Maven build.
   
   ## What changes are included in this PR?
   
   - `proto/avro_read_options.proto` -- new `AvroReadOptionsProto` message.
   - `AvroReadOptions` Java builder with `fileExtension(String)` and 
`schema(Schema)` setters.
   - `native/src/avro.rs` JNI module.
   - `native/build.rs`, `native/src/lib.rs`: register the new proto and the new 
module.
   - `pom.xml` + `core/pom.xml`: add `org.apache.avro:avro` (1.12.0) in test 
scope
   
   ## Are these changes tested?
   
   Yes -- 9 new tests across `AvroReadOptionsTest` and `SessionContextAvroTest`.
   
   ## Are there any user-facing changes?
   
   Yes -- purely additive. New public API:
   
   - `org.apache.datafusion.AvroReadOptions`
   - `SessionContext.registerAvro(String, String)`
   - `SessionContext.registerAvro(String, String, AvroReadOptions)`
   - `SessionContext.readAvro(String) → DataFrame`
   - `SessionContext.readAvro(String, AvroReadOptions) → DataFrame`
   
   The new `org.apache.datafusion.protobuf.AvroReadOptionsProto` generated 
class is also exposed via the protobuf-Java output, consistent with how 
`CsvReadOptionsProto`, `ArrowReadOptionsProto`, etc. are exposed. No API 
removals, no deprecations, no behavior change for existing callers.
   
   The native binary grows ~4.4 MiB unstripped to enable Avro's datasource 
crate (see Rationale).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to