bbannier commented on PR #10467:
URL: https://github.com/apache/datafusion/pull/10467#issuecomment-2106229273

   This is currently a sketch for a possible implementation for #10262. The 
approach taken push interpretation of comment lines into `arrow-csv` 
apache/arrow-rs#5759 adding support for that; the task here is then to plug a 
`datafusion` comment config setting through to `arrow-csv`.
   
   If this is a viable solution it would require a bump of at least the 
`arrow-csv` dependency in `datafusion` to a version containing support for 
comments. To at least explore that I prefixed the actual implementation patch 
here with two patches performing that bump. It appears that a bump to the 
`master` version of `arrow-csv` (or something else from the collection of 
crates in https://github.com/apache/arrow-rs) requires changes to `datafusion`; 
I attempted to perform that bump, but currently there are still some remaining 
issues,
   
   ```---- aggregates::tests::aggregate_source_not_yielding_with_spill stdout 
----
   Error: ResourcesExhausted("Failed to allocate additional 2208 bytes for 
GroupedHashAggregateStream[0] with 348 bytes already allocated - maximum 
available is 1600")
   
   ---- aggregates::tests::aggregate_source_with_yielding_with_spill stdout ----
   Error: ResourcesExhausted("Failed to allocate additional 2208 bytes for 
GroupedHashAggregateStream[0] with 348 bytes already allocated - maximum 
available is 1600")
   
   ---- aggregates::tests::run_first_last_multi_partitions stdout ----
   Error: ResourcesExhausted("Failed to allocate additional 3704 bytes for 
GroupedHashAggregateStream[0] with 437 bytes already allocated - maximum 
available is 3200")
   ```
   
   @alamb, would you be open to shepherding this PR and 
https://github.com/apache/arrow-rs/pull/5759, or alternatively could identify 
someone who could?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to