steveloughran commented on PR #3559: URL: https://github.com/apache/parquet-java/pull/3559#issuecomment-4480988160
so for all cluster filesystems with vector io, it's fine as is. HDFS doesn't support it and for the cloud stores it's all ranged reads straight into allocated buffers. I think maybe in hadoop we should just cut the attempt to be clever and merge ranges, and just do the parallel reads. On clusterfs work with Owen O'Malley and claude to do the right thing here. what would be the perceived penalty of reading the whole file block into one allocated buffer, copying the requested pieces into two separate buffers, and then releasing the larger one (a release function is now returned down after all). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
