rdblue commented on a change in pull request #675: Inherit snapshot ids for 
manifest entries
URL: https://github.com/apache/incubator-iceberg/pull/675#discussion_r364357342
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/FastAppend.java
 ##########
 @@ -81,15 +83,21 @@ public FastAppend appendFile(DataFile file) {
 
   @Override
   public FastAppend appendManifest(ManifestFile manifest) {
-    // the manifest must be rewritten with this update's snapshot ID
-    try (ManifestReader reader = ManifestReader.read(
-        ops.io().newInputFile(manifest.path()), ops.current().specsById())) {
-      OutputFile newManifestPath = 
manifestPath(manifestCount.getAndIncrement());
-      appendManifests.add(ManifestWriter.copyAppendManifest(reader, 
newManifestPath, snapshotId(), summaryBuilder));
+    Preconditions.checkArgument(!manifest.hasExistingFiles(), "Cannot append 
manifest with existing files");
+    Preconditions.checkArgument(!manifest.hasDeletedFiles(), "Cannot append 
manifest with deleted files");
+    Preconditions.checkArgument(manifest.snapshotId() == null, "Snapshot id 
must be assigned during commit");
+
+    // TODO: avoid reading manifests to simply get stats
+    try (ManifestReader reader = ManifestReader.read(manifest, ops.io(), 
ops.current().specsById())) {
 
 Review comment:
   Ideally, we would produce all of the summary stats, but the most important 
ones are `total-data-files`, `total-records`, and the `added-` or `deleted-` 
properties that are used to produce totals. I think it's okay to not write the 
`changed-partition-count` metrics if they require scanning the appended 
manifest.
   
   I think my response was confusing because we keep additional summary 
information about each partition in our version. I can move that upstream if 
everyone wants it, but it can make the metadata files quite large. Without a 
use case for doing this upstream, I didn't think it was a good idea to make 
everyone's metadata significantly larger.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to