Fokko commented on issue #33972: URL: https://github.com/apache/arrow/issues/33972#issuecomment-1414514062
@westonpace sure thing! We need to make projections, and we need to have the schema before loading the data. For example, if you have an Iceberg table, and you do a rename on a column, then you don't want to rewrite your multi-petabyte table. Iceberg uses IDs to identify the column, and if you filter or project on that column, it will select the old column name in the files that are written before the rename. The current code is over here: https://github.com/apache/iceberg/blob/master/python/pyiceberg/io/pyarrow.py#L486-L522 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
