[GitHub] [iceberg] edgarRd opened a new issue #1206: Metrics are not stored in DataFile when imported via `SparkTableUtil#listPartitions` if no Iceberg IDs are present

GitBox Tue, 14 Jul 2020 17:51:50 -0700


edgarRd opened a new issue #1206:
URL: https://github.com/apache/iceberg/issues/1206



   When migrating a table to Iceberg and importing existing files, e.g. ORC 
files, via `SparkTableUtil#listPartitions` the files imported most likely do 
not have Iceberg IDs stored - at least this is the case for ORC. Therefore, 
metrics computation for these files is skipped entirely.
   
   We could use NameMapping to match the destination table Schema in Iceberg to 
the column names in the existing files. This will allow users to migrate to 
Iceberg and get metrics computed which ultimately will improve query 
performance.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] edgarRd opened a new issue #1206: Metrics are not stored in DataFile when imported via `SparkTableUtil#listPartitions` if no Iceberg IDs are present

Reply via email to