etseidl commented on issue #16200: URL: https://github.com/apache/datafusion/issues/16200#issuecomment-2928051534
> One challenge / tradeoff that would be interesting/required is that doing another async load to read more of the metdata will be very bad if that has to actually go to object store again. Yes, this has me very worried. The layout of the column index is by row group, then column. So to read just a single column requires jumping around quite a bit if there are many row groups. Also, if there is no projection involved, the entire offset index will be read as well. This will need some careful testing to see if multiple fetches are worthwhile, or if doing a single fetch with a range large enough to include all column and offset indexes needed (and then only _parsing_ the needed indexes) would be better. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org