lidavidm commented on pull request #9725: URL: https://github.com/apache/arrow/pull/9725#issuecomment-803044588
The motivation was to support more advanced users who might want to scan the same files repeatedly with different options. But that is a niche use case and the common case is a bit confusing. Logically, the separation is roughly between 'things that would change the schema or format', e.g. the separator, or rows to skip, and 'everything else', e.g. the set of null values - but this isn't obvious to a user who probably just wants to specify all their options together. Maybe the respective scan options could be inlined or embedded into the file format to provide defaults? Which could then be overridden if a user wants to do something more complex. That would be some boilerplate, but would make things easier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
