nyxtom commented on PR #13938: URL: https://github.com/apache/iceberg/pull/13938#issuecomment-3234550232
> Ah I think the issue is that in our code in the library we assume that the Parquet Reader already has a project which only selects those columns which need to be read prior to opening the file. We have to do this anyway because we have to map the names in the schema to the names in the file based on field id's. > > https://github.com/apache/iceberg/blob/cf74b65230f7275654221bf87eb92c1c78248cdc/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedReaderBuilder.java#L171-L174 > > It seems like we aren't doing a similar thing with the Arrow reader? > > Is that on track? I'm trying to figure this out but I think ideally we just don't try to read null vectors at all at a higher level? That sounds correct -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
