Re: [PR] HADOOP-19211. AliyunOSS: Support vectored read API [hadoop]

via GitHub Wed, 17 Jul 2024 03:28:31 -0700


steveloughran commented on PR #6904:
URL: https://github.com/apache/hadoop/pull/6904#issuecomment-2232972993


   > , it would be good to create a VectoredInputStream which takes the actual 
DataInputStream as input and then all the Object stores like abfs, s3 and 
allyun extending this VectoredInputStream. Not really sure if this is feasible 
and will work.
   
   mixed feelings. 
   
   * abfs is the most advanced in terms of prefetch and block cache, openFile() 
support
   * classic s3a does vector IO, IOStatistics context, but reaching AOL.
   * don't know about the others
   
   s3a prefetch stream is not ready for real use; #5832 does a lot of this. I'd 
like that in just to show some progress.
   
   if we were to do a new stream, I'd want
   
   * block structure underneath
   * openFile() length, read policies, split start end to frame cache
   * footer prefetch cache for orc/parquet files
   * unbuffer() frees block cache
   * prefetching disabled on columnar formats opened with openFile read policy
   
   
   what we should do is factor out commonality and put into common. 
   
   on that note, if anyone could take up #6773 and #1747 to create contract 
tests for ByteBufferPositionedReadable we could share that with all impls of 
vector io


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HADOOP-19211. AliyunOSS: Support vectored read API [hadoop]

Reply via email to