rdblue commented on a change in pull request #675: Inherit snapshot ids for 
manifest entries
URL: https://github.com/apache/incubator-iceberg/pull/675#discussion_r373281555
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
 ##########
 @@ -203,17 +211,29 @@ protected void add(DataFile file) {
    * Add all files in a manifest to the new snapshot.
    */
   protected void add(ManifestFile manifest) {
-    // the manifest must be rewritten with this update's snapshot ID
-    try (ManifestReader reader = ManifestReader.read(
-        ops.io().newInputFile(manifest.path()), ops.current().specsById())) {
-      ManifestFile manifestFile = ManifestWriter.copyAppendManifest(
-          reader, manifestPath(manifestCount.getAndIncrement()), snapshotId(), 
appendedManifestsSummary);
-      appendManifests.add(manifestFile);
-      // keep reference of the first appended manifest, so that we can avoid 
merging first bin(s)
-      // which has the first appended manifest and have not crossed the limit 
of minManifestsCountToMerge
-      if (firstAppendedManifest == null) {
-        firstAppendedManifest = manifestFile;
-      }
+    ManifestFile appendedManifest;
+    if (snapshotIdInheritanceEnabled && manifest.snapshotId() == null) {
+      appendedManifestsSummary.addedManifest(manifest);
+      appendManifests.add(manifest);
+      appendedManifest = manifest;
+    } else {
+      // the manifest must be rewritten with this update's snapshot ID
+      ManifestFile copiedManifest = copyManifest(manifest);
+      rewrittenAppendManifests.add(copiedManifest);
+      appendedManifest = copiedManifest;
+    }
+
+    // keep reference of the first appended manifest, so that we can avoid 
merging first bin(s)
 
 Review comment:
   Actually, I just realized that the appended manifests are added to metadata 
before the rewritten append manifests. That means that this should actually be 
the first appended manifest or the first rewritten if all were rewritten. The 
only case we have to worry about is when the first manifest is rewritten, but a 
manifest with a `null` snapshot ID is added later.
   
   It's probably okay to move on since it would be extremely rare and the only 
problem would be that a bin might get merged when it otherwise wouldn't have 
been. Not a big problem.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to