etseidl commented on issue #16200:
URL: https://github.com/apache/datafusion/issues/16200#issuecomment-2928051534

   > One challenge / tradeoff that would be interesting/required is that doing 
another async load to read more of the metdata will be very bad if that has to 
actually go to object store again.
   
   Yes, this has me very worried. The layout of the column index is by row 
group, then column. So to read just a single column requires jumping around 
quite a bit if there are many row groups. Also, if there is no projection 
involved, the entire offset index will be read as well. This will need some 
careful testing to see if multiple fetches are worthwhile, or if doing a single 
fetch with a range large enough to include all column and offset indexes needed 
(and then only _parsing_ the needed indexes) would be better. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to