mapleFU commented on issue #41604: URL: https://github.com/apache/arrow/issues/41604#issuecomment-2111503540
> The buffer_size parameter in pyarrow_s3fs.open_input_stream(path_src, buffer_size=10_000_000) will do the same thing as pyarrow_s3fs.open_input_stream. Hmmm so the input buffer of s3fs is useless? It doesn't to underlying buffering > The default 65K might be a little bit too small for nowaday computers, e.g., when not using gz file, the default batch size for csv streaming is 1M, I agree fixed size buffering is weird, but I think the CompressedInputStream's buffer-size is just "decompress-input-buffer-size" rather than "s3-io-size" This can be add in C++ firstly and making a chunk-size as input argument as`max(kChunkSize, input-chunk-size)`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
