steveloughran commented on PR #6904: URL: https://github.com/apache/hadoop/pull/6904#issuecomment-2232972993
> , it would be good to create a VectoredInputStream which takes the actual DataInputStream as input and then all the Object stores like abfs, s3 and allyun extending this VectoredInputStream. Not really sure if this is feasible and will work. mixed feelings. * abfs is the most advanced in terms of prefetch and block cache, openFile() support * classic s3a does vector IO, IOStatistics context, but reaching AOL. * don't know about the others s3a prefetch stream is not ready for real use; #5832 does a lot of this. I'd like that in just to show some progress. if we were to do a new stream, I'd want * block structure underneath * openFile() length, read policies, split start end to frame cache * footer prefetch cache for orc/parquet files * unbuffer() frees block cache * prefetching disabled on columnar formats opened with openFile read policy what we should do is factor out commonality and put into common. on that note, if anyone could take up #6773 and #1747 to create contract tests for ByteBufferPositionedReadable we could share that with all impls of vector io -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
