Re: [I] [C++][Python] Default value for CompressedInputStream kChunkSize might be too small [arrow]

via GitHub Tue, 14 May 2024 20:17:07 -0700


mapleFU commented on issue #41604:
URL: https://github.com/apache/arrow/issues/41604#issuecomment-2111503540


   > The buffer_size parameter in pyarrow_s3fs.open_input_stream(path_src, 
buffer_size=10_000_000) will do the same thing as 
pyarrow_s3fs.open_input_stream.
   
   Hmmm so the input buffer of s3fs is useless? It doesn't to underlying 
buffering
   
   > The default 65K might be a little bit too small for nowaday computers, 
e.g., when not using gz file, the default batch size for csv streaming is 1M,
   
   I agree fixed size buffering is weird, but I think the 
CompressedInputStream's buffer-size is just "decompress-input-buffer-size" 
rather than "s3-io-size"
   
   This can be add in C++ firstly and making a chunk-size as input argument 
as`max(kChunkSize, input-chunk-size)`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [C++][Python] Default value for CompressedInputStream kChunkSize might be too small [arrow]

Reply via email to