[GitHub] [arrow-rs] zeevm commented on pull request #4156: Cleanup ChunkReader (#4118)

via GitHub Thu, 18 May 2023 11:30:35 -0700


zeevm commented on PR #4156:
URL: https://github.com/apache/arrow-rs/pull/4156#issuecomment-1553459656


   @tustvold the length argument was quite useful I think, my main use case is 
processing Parquet files from cloud storage (azure, s3 etc.) so I'm trying to 
minimize on downloaded data to both save time and cost, I used the length 
argument (with some additional extension if it was below some threshold) to 
only download what the reader required in a single 'read' op (cloud storage 
services usually charge per 10K ops + bandwidth)
   
   Now I don't know how much to download, I'll have to develop a bunch of 
heuristics to get to some sensible values.
   
   Since the reader always needs some specific entity (say page, or entire 
column chunk, or the file metadata) it knows what 'length' it needs, why not 
provide this as a 'hint' to the implementation?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] zeevm commented on pull request #4156: Cleanup ChunkReader (#4118)

Reply via email to