Re: [I] [Java] Java Dataset API ScanOptions expansion [arrow]

via GitHub Mon, 06 May 2024 05:41:38 -0700


westonpace commented on issue #28866:
URL: https://github.com/apache/arrow/issues/28866#issuecomment-2095929942


   Is the JNI -> dataset API using substrait today?  If so, then I think 1 is 
preferred.  However, the user should not provide `AdvancedExtension`, this 
should be taken care of internally.
   
   For example, the user Java API for "scan CSV" could take in `HashMap<String, 
String>`.  The Java implementation can convert this into a `AdvancedExtension` 
similar to the one described in `incubator-gluten`.  Acero could be updated to 
detect this and configure the fragment options appropriately.
   
   I will add, for consideration, a third proposal, which is that we could add 
CSV as a read type in Substrait (e.g. make a PR on the Substrait repo).  There 
was some work started 
[here](https://github.com/substrait-io/substrait/issues/174) but it was never 
finished.
   
   That third option will be slower though.  It would require Substrait 
alignment on what CSV options are common.  This will take time and energy.  It 
would be the most useful long term option.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Java] Java Dataset API ScanOptions expansion [arrow]

Reply via email to