steveloughran commented on PR #968: URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1135585289
> I was working with s3a > Spark 3.2.1 > Hadoop (Hadoop-aws) 3.3.2 > AWS SDK 1.11.655 thanks., that means you are current with all shipping improvments. the main one extra is to use openFile(), passing in length and requesting randomio. this guarantees ranged GET requests and cuts the initial HEAD probe for existence/size of file. >> have you benchmarked this change with abfs or google gcs connectors to see what difference it makes there? > No I have not. Would love help from anyone in the community with access to these. I only have access to S3. that I have. FWIW, with the right tuning of abfs prefetch (4 threads, 128 MB blocks) i can get full FTTH link rate from a remote store; 700 mbit/s . that's to the base station. once you add wifi the bottlenecks move. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org