[GitHub] [hadoop] steveloughran commented on pull request #5172: HADOOP-18543. AliyunOSSFileSystem#open(Path path, int bufferSize) use buffer size as its downloadPartSize

GitBox Thu, 01 Dec 2022 09:26:02 -0800


steveloughran commented on PR #5172:
URL: https://github.com/apache/hadoop/pull/5172#issuecomment-1334108437


   sorry, but I'm going to say -1 to using the normal IO buffer size as the GET 
range. The default value of 4k is way too small even for parquet/orc reads, it 
will break all existing apps in performance terms: distcp, parquet library, 
avro, ORC, everything, as they all use the default value.
   
   1. there is a configuration option for multipart download size, which is 
filesystem-wide. Not as flexible, but something everyone will expect to work.
   2. If you want better control of read policy, buffer sizes etc, then this 
connector needs to implement openFile(), as s3a and abfs do. that will let you 
add a new option to specify the range for GET calls.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hadoop] steveloughran commented on pull request #5172: HADOOP-18543. AliyunOSSFileSystem#open(Path path, int bufferSize) use buffer size as its downloadPartSize

Reply via email to