Re: [I] Parquet deserialization speeds slower on Linux [arrow]

via GitHub Wed, 25 Oct 2023 05:38:08 -0700


mrocklin commented on issue #38389:
URL: https://github.com/apache/arrow/issues/38389#issuecomment-1779182179


   I think that those files were relatively small.  Unfortunately we've since 
replaced them with larger files.  I'll re-run everything and include table size 
numbers as well, hopefully within the next hour.
   
   > And some machine might have different bandwidth, this should also be taken 
into account
   
   Certainly this is true.  This is why I record S3 bandwidth at the beginning. 
 That gives us a baseline.  My goal isn't to get a high MiB/s number.  It's to 
get a full pipeline that is close to S3 bandwidth.  I hope that S3, rather than 
pyarrow.parquet, is the bottleneck.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Parquet deserialization speeds slower on Linux [arrow]

Reply via email to