steveloughran commented on PR #5172: URL: https://github.com/apache/hadoop/pull/5172#issuecomment-1334108437
sorry, but I'm going to say -1 to using the normal IO buffer size as the GET range. The default value of 4k is way too small even for parquet/orc reads, it will break all existing apps in performance terms: distcp, parquet library, avro, ORC, everything, as they all use the default value. 1. there is a configuration option for multipart download size, which is filesystem-wide. Not as flexible, but something everyone will expect to work. 2. If you want better control of read policy, buffer sizes etc, then this connector needs to implement openFile(), as s3a and abfs do. that will let you add a new option to specify the range for GET calls. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
