marsupialtail commented on PR #14078: URL: https://github.com/apache/arrow/pull/14078#issuecomment-1241119871
Recap the discussion: we want to add slice to the FileFragment class so that we can open a Reader for just a partial byte range. Implemented a Slice method for FileFragment that makes a new FileFragment with a specified byte range. Currently this assumes that the byte ranges supplied respect line breaks for the CSV file format. If the byte range starts/ends in the middle of a line, then an error will be thrown when the Reader parses the start / end block. As a result of this this PR doesn't support Parquet since you should just use the subset API already available to get this functionality. In the future, we want to incorporate @zhztheplayer 's work to allow slicing in the middle of a row group for Parquet and possibly slicing in the middle of a linebreak for CSV. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
