bbannier commented on PR #10467: URL: https://github.com/apache/datafusion/pull/10467#issuecomment-2106229273
This is currently a sketch for a possible implementation for #10262. The approach taken push interpretation of comment lines into `arrow-csv` apache/arrow-rs#5759 adding support for that; the task here is then to plug a `datafusion` comment config setting through to `arrow-csv`. If this is a viable solution it would require a bump of at least the `arrow-csv` dependency in `datafusion` to a version containing support for comments. To at least explore that I prefixed the actual implementation patch here with two patches performing that bump. It appears that a bump to the `master` version of `arrow-csv` (or something else from the collection of crates in https://github.com/apache/arrow-rs) requires changes to `datafusion`; I attempted to perform that bump, but currently there are still some remaining issues, ```---- aggregates::tests::aggregate_source_not_yielding_with_spill stdout ---- Error: ResourcesExhausted("Failed to allocate additional 2208 bytes for GroupedHashAggregateStream[0] with 348 bytes already allocated - maximum available is 1600") ---- aggregates::tests::aggregate_source_with_yielding_with_spill stdout ---- Error: ResourcesExhausted("Failed to allocate additional 2208 bytes for GroupedHashAggregateStream[0] with 348 bytes already allocated - maximum available is 1600") ---- aggregates::tests::run_first_last_multi_partitions stdout ---- Error: ResourcesExhausted("Failed to allocate additional 3704 bytes for GroupedHashAggregateStream[0] with 437 bytes already allocated - maximum available is 3200") ``` @alamb, would you be open to shepherding this PR and https://github.com/apache/arrow-rs/pull/5759, or alternatively could identify someone who could? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org