westonpace commented on issue #28866: URL: https://github.com/apache/arrow/issues/28866#issuecomment-2095929942
Is the JNI -> dataset API using substrait today? If so, then I think 1 is preferred. However, the user should not provide `AdvancedExtension`, this should be taken care of internally. For example, the user Java API for "scan CSV" could take in `HashMap<String, String>`. The Java implementation can convert this into a `AdvancedExtension` similar to the one described in `incubator-gluten`. Acero could be updated to detect this and configure the fragment options appropriately. I will add, for consideration, a third proposal, which is that we could add CSV as a read type in Substrait (e.g. make a PR on the Substrait repo). There was some work started [here](https://github.com/substrait-io/substrait/issues/174) but it was never finished. That third option will be slower though. It would require Substrait alignment on what CSV options are common. This will take time and energy. It would be the most useful long term option. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
