LantaoJin opened a new pull request, #53: URL: https://github.com/apache/datafusion-java/pull/53
## Which issue does this PR close? - Closes 38. ## Rationale for this change `DataFrame.writeParquet` (#27) lets Java callers materialize a query result, but CSV remains read-only. DataFusion supports `DataFrame::write_csv` upstream with the full writer-side option surface (delimiter, quote, escape, null token, compression, partitioning, single-file vs directory output). Issue #38 tracks exposing it on the Java side. The CSV write surface is wider than parquet's — six writer-side knobs plus `singleFileOutput` and `partitionCols` — so this PR uses the proto-over-JNI pattern (introduced in #29 and reused by all the read-side option classes) instead of the wide-JNI pattern `writeParquet` shipped with. Sending a single `byte[]` keeps the JNI signature stable as more knobs are added. ## What changes are included in this PR? - `proto/csv_write_options.proto` — new `CsvWriteOptionsProto` message. Fields are `optional` so unset values preserve DataFusion's defaults; `partition_cols` is `repeated` so the empty list round-trips unambiguously. `FileCompressionType` is reused from `csv_read_options.proto` because the codec set is identical between read and write at the upstream level. Promoting the enum to a shared `compression.proto` in PR #47. Once that lands, this PR's import switches one line. - `CsvWriteOptions` Java builder mirroring the upstream `CsvOptions` writer-side API: `singleFileOutput`, `partitionCols`, `hasHeader`, `delimiter`, `quote`, `escape`, `nullValue`, `fileCompressionType`. All defaults are unset (null) so callers only pay for knobs they touch. - `DataFrame.writeCsv(String)` and `DataFrame.writeCsv(String, CsvWriteOptions)` overloads with up-front null-arg validation. The receiver remains usable after the call, matching `writeParquet`'s "retain after write" semantics. - `Java_org_apache_datafusion_DataFrame_writeCsvWithOptions` JNI handler in `native/src/csv.rs` (co-located with the read-side handlers since they share the proto-decode plumbing). Decodes the proto, builds `DataFrameWriteOptions` and an `Option<CsvOptions>`, then calls `DataFrame::write_csv`. `Option<CsvOptions>` is left as `None` when no writer knob is set so DataFusion's defaults apply. Out of scope (for follow-ups): - Other writer fields exposed by upstream `CsvOptions` but not in #38's checklist: `terminator`, `doubleQuote`, `dateFormat`, `datetimeFormat`, `timestampFormat`, `timestampTzFormat`, `timeFormat`, `compressionLevel`, `truncatedRows`. Easy follow-up — same proto, just add fields. - `compression_level` — separate from the compression codec; the upstream `CsvOptions` exposes `with_compression_level` but the issue doesn't list it. ## Are these changes tested? Yes, 11 new tests across `CsvWriteOptionsTest` and `DataFrameWriteCsvTest`. ## Are there any user-facing changes? Yes, purely additive. New public API: - `org.apache.datafusion.CsvWriteOptions` - `DataFrame.writeCsv(String)` - `DataFrame.writeCsv(String, CsvWriteOptions)` The new `org.apache.datafusion.protobuf.CsvWriteOptionsProto` generated class is also exposed via the protobuf-Java output, consistent with how the read-side option protos are exposed. No API removals, no deprecations, no behavior change for existing callers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
