Re: [I] [R] read_parquet performs to slow [arrow]

via GitHub Tue, 17 Oct 2023 13:06:04 -0700


amoeba commented on issue #38032:
URL: https://github.com/apache/arrow/issues/38032#issuecomment-1767088848


   Hi @kostovasandra, thanks for the patience. Support for modifying the S3 log 
level was [just merged](https://github.com/apache/arrow/pull/38267) so that 
should go into the Arrow 15 release. Until then, what could really help would 
be if you could give us a way to reproduce your issue ourselves. Is it possible 
to share the Parquet file or for you to write a script that generates a Parquet 
file that reproduces the issue that you could share?
   
   I did a test with a 17million row (~600MB) Parquet file on S3 and I get 
almost the exact timing (~17sec) in either R or Python against `us-west-2`. So 
while tweaking Parquet reader options might help, it seems to me like some 
characteristic of your custom S3 endpoint or your file may be involved here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [R] read_parquet performs to slow [arrow]

Reply via email to