RussellSpitzer commented on code in PR #4612:
URL: https://github.com/apache/iceberg/pull/4612#discussion_r857711524


##########
core/src/main/java/org/apache/iceberg/BaseRewriteManifests.java:
##########
@@ -257,7 +257,10 @@ private void validateDeletedManifests(Set<ManifestFile> 
currentManifests) {
         .filter(manifest -> !currentManifests.contains(manifest))
         .findAny()
         .ifPresent(manifest -> {
-          throw new ValidationException("Manifest is missing: %s", 
manifest.path());
+          throw new ValidationException("Cannot commit RewriteManifests; " +
+                  "manifests that existed at the beginning of this rewrite 
have already been removed by another " +
+                  "operation. Manifest %s, which would be replaced by this 
operation, has already been removed.",
+                  manifest.path());

Review Comment:
   I think "Manifest is missing" is a very confusing message. From my talks 
with folks who have hit this as well as other Iceberg dev's, the gut reaction 
to this message is that the table has become corrupted. After all why would 
"manifest is missing" be an issue for an operation which is removing manifests? 
Having a missing manifest also generally means that the table is no longer 
stable so users are on high alert.
   
   To further confound things, the manifest probably still exists and a user 
can find it, which leads them to ask "Why would Iceberg think the file is 
missing when it is clearly on disk?". Users often let me know in conjunction 
that Iceberg says a manifest is missing but they can see it on disk
   
   In my opinion a good error message has to explain what went wrong, why it 
went wrong, and how to respond (if such a response is possible and non-obvious).
   
   At the moment the error message at best explains the "why it went wrong" but 
without enough context to really let the user even know what part of the 
rewrite actually broke.
   
   If I was going to order information here the most important concept to get 
across is 
   1. "What" - The commit has failed (not writing, reading or deleting 
manifests)
   2. "Why" - The change we prepared no longer cleanly applies because a 
manifest we thought existed is no longer current
   3. "Remediation" - Retry the operation if desired
   
   So if we want to trim it down to the essentials
   ```
   "Cannot commit, table has been concurrently modified and manifest %s is no 
longer part of the table"
   ```
   Although even knowing which manifest is missing is not even that valid so we 
could remove that too.
   
   I actually have issues with most of our Validation messages in 
MergingSnapshotProducer for the similar reason. Users are often confused by 
"Found conflicting files that can contain records matching true". But we can 
save that for future PR's
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to