lidavidm commented on pull request #10060:
URL: https://github.com/apache/arrow/pull/10060#issuecomment-821323392


   Note this is somewhat of a regression for CSV files/if you call dim.Dataset 
in R as now we'll have to scan files instead of just immediately returning NA. 
We do have some options:
   - We could add an option to just fail if a "cheap" count can't be performed, 
so that R could fall back to reporting just NA.
   - We could optimize the CSV case like the IPC and Parquet ones. This should 
be possible when `newlines_in_values` is not set and needs some consideration 
for `ignore_empty_lines`. This may or may not not actually be all that much 
cheaper than loading the data. 
   
   Also, I need to refactor this to pass around a ScanOptions instead of an 
IOContext.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to