[GitHub] [arrow] lidavidm commented on pull request #10060: ARROW-9697: [C++][Python][R][Dataset] Add CountRows for Scanner

GitBox Fri, 16 Apr 2021 10:22:27 -0700


lidavidm commented on pull request #10060:
URL: https://github.com/apache/arrow/pull/10060#issuecomment-821323392



   Note this is somewhat of a regression for CSV files/if you call dim.Dataset 
in R as now we'll have to scan files instead of just immediately returning NA. 
We do have some options:
   - We could add an option to just fail if a "cheap" count can't be performed, 
so that R could fall back to reporting just NA.
   - We could optimize the CSV case like the IPC and Parquet ones. This should 
be possible when `newlines_in_values` is not set and needs some consideration 
for `ignore_empty_lines`. This may or may not not actually be all that much 
cheaper than loading the data. 
   
   Also, I need to refactor this to pass around a ScanOptions instead of an 
IOContext.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] lidavidm commented on pull request #10060: ARROW-9697: [C++][Python][R][Dataset] Add CountRows for Scanner

Reply via email to