amoeba commented on issue #38032: URL: https://github.com/apache/arrow/issues/38032#issuecomment-1767088848
Hi @kostovasandra, thanks for the patience. Support for modifying the S3 log level was [just merged](https://github.com/apache/arrow/pull/38267) so that should go into the Arrow 15 release. Until then, what could really help would be if you could give us a way to reproduce your issue ourselves. Is it possible to share the Parquet file or for you to write a script that generates a Parquet file that reproduces the issue that you could share? I did a test with a 17million row (~600MB) Parquet file on S3 and I get almost the exact timing (~17sec) in either R or Python against `us-west-2`. So while tweaking Parquet reader options might help, it seems to me like some characteristic of your custom S3 endpoint or your file may be involved here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org