[GitHub] [arrow] Fokko commented on issue #33972: [Python] Remove redundant S3 call

via GitHub Thu, 02 Feb 2023 15:37:22 -0800


Fokko commented on issue #33972:
URL: https://github.com/apache/arrow/issues/33972#issuecomment-1414514062


   @westonpace sure thing!
   
   We need to make projections, and we need to have the schema before loading 
the data. For example, if you have an Iceberg table, and you do a rename on a 
column, then you don't want to rewrite your multi-petabyte table. Iceberg uses 
IDs to identify the column, and if you filter or project on that column, it 
will select the old column name in the files that are written before the rename.
   
   The current code is over here: 
https://github.com/apache/iceberg/blob/master/python/pyiceberg/io/pyarrow.py#L486-L522
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] Fokko commented on issue #33972: [Python] Remove redundant S3 call

Reply via email to