friendlymatthew commented on PR #20822: URL: https://github.com/apache/datafusion/pull/20822#issuecomment-4025466833
> > > This looks great to me! > > > How do we generate the right ProjectionMask / translate into the right leaf column index in Parquet? I don't see that added anywhere but maybe the existing code already did that correctly? > > > > > > We do not. I added a note in the PR message: > > > Note: this does not address the projection side and should not be blocked by it. SELECT s['foo'] still reads the entire struct rather than just the needed leaf column. That requires separate changes to how the opener builds its projection mask. > > > > > > I have some ideas about this, and will push up a follow up PR > > Wording It's a bit confusing here, because projection can mean one of two things: the select part of the query, or the projection of the columns that the filters need to be evaluated. I assume here you are referring to the latter, i.e. although we support these filters now, we still read the entire struct column and then apply the get field operation in memory? Yes exactly. Filters on struct fields are now pushed down to the row-level filter, but the projection side still reads the entire struct column. I still need to teach Datafusion to project only the needed subcolumns (leaf columns) of the struct rather than materializing the whole thing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
