[GitHub] [arrow] lidavidm commented on pull request #9725: ARROW-8631: [C++][Python][Dataset] Add ReadOptions to CsvFileFormat, expose options to python

GitBox Fri, 19 Mar 2021 11:56:56 -0700


lidavidm commented on pull request #9725:
URL: https://github.com/apache/arrow/pull/9725#issuecomment-803044588



   The motivation was to support more advanced users who might want to scan the 
same files repeatedly with different options. But that is a niche use case and 
the common case is a bit confusing. Logically, the separation is roughly 
between 'things that would change the schema or format', e.g. the separator, or 
rows to skip, and 'everything else', e.g. the set of null values - but this 
isn't obvious to a user who probably just wants to specify all their options 
together.
   
   Maybe the respective scan options could be inlined or embedded into the file 
format to provide defaults? Which could then be overridden if a user wants to 
do something more complex. That would be some boilerplate, but would make 
things easier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] lidavidm commented on pull request #9725: ARROW-8631: [C++][Python][Dataset] Add ReadOptions to CsvFileFormat, expose options to python

Reply via email to