steveloughran commented on PR #968:
URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1135585289

   > I was working with s3a
   > Spark 3.2.1
   > Hadoop (Hadoop-aws) 3.3.2
   > AWS SDK 1.11.655
   
   thanks., that means you are current with all shipping improvments. the main 
one extra is to use openFile(), passing in length and requesting randomio. this 
guarantees ranged GET requests and cuts the initial HEAD probe for 
existence/size of file.
   
   >> have you benchmarked this change with abfs or google gcs connectors to 
see what difference it makes there?
   
   > No I have not. Would love help from anyone in the community with access to 
these. I only have access to S3.
   
   that I have. FWIW, with the right tuning of abfs prefetch (4 threads, 128 MB 
blocks) i can get full FTTH link rate from a remote store; 700 mbit/s . that's 
to the base station. once you add wifi the bottlenecks move. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to