[GitHub] [arrow] nealrichardson commented on pull request #9725: ARROW-8631: [C++][Python][Dataset] Add ReadOptions to CsvFileFormat, expose options to python

GitBox Fri, 19 Mar 2021 12:12:36 -0700


nealrichardson commented on pull request #9725:
URL: https://github.com/apache/arrow/pull/9725#issuecomment-803052664



   > Maybe the respective scan options could be inlined or embedded into the 
file format to provide defaults?
   
   Yeah I think that would be nice. I don't understand well the use case of 
scanning the same files with different parsing options unless I'm trying to 
figure out what the "right" options are. To me, things like `null_values` are 
not scan-time preferences, they're properties that describe what's in the 
files, so I want to declare them up front and don't need to adjust them later. 
   
   Is there a reason one would need to scan the same dataset with different 
parsing options, rather than create a new dataset with the options specified up 
front? I wonder whether the extra complexity in accepting them also at scan 
time is worth it if there's a simple solution like that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] nealrichardson commented on pull request #9725: ARROW-8631: [C++][Python][Dataset] Add ReadOptions to CsvFileFormat, expose options to python

Reply via email to