[GitHub] [iceberg] ConeyLiu commented on pull request #4898: Core: Add schema_id to ContentFile/ManifestFile

GitBox Tue, 09 Aug 2022 20:53:57 -0700


ConeyLiu commented on PR #4898:
URL: https://github.com/apache/iceberg/pull/4898#issuecomment-1210127293

> I think the situation would be the same even in your proposal to add new
schemaid field to data_file, right? After rewriteDataFiles we have to carry
over the latest schema-id of each spec , in order for your initial proposed
optimization to be accurate? Because there may be data in the new file that was
written by a later schema.

You are correct. The data file with the new spec after rewrite. We can not
benefit from the schema evaluation because we lost the original schema
information.

> As far as I can tell, it seems to be the right one that the manifest was
written in, even after rewriteManifests.

In RewriteManifest, we use the current table partition
[spec](https://github.com/apache/iceberg/blob/master/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteManifestsSparkAction.java#L103)
or the specified spec with spec ID. I think the schema used in the current
space is not the same as the original schema for the old manifest file. That's
because we will [rewrite the partition
spec](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/TableMetadata.java#L952)
when updating the table schema. Please correct me if I am wrong.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ConeyLiu commented on pull request #4898: Core: Add schema_id to ContentFile/ManifestFile

Reply via email to