rdblue commented on issue #1014: URL: https://github.com/apache/incubator-iceberg/issues/1014#issuecomment-631684123
@vanliu-tx, the `UpdateEvent` problem is happening because the snapshot is missing. Just after the commit, the snapshot is loaded to clean up uncommitted files and to send the update event. Both of those tasks are failing to load the committed snapshot. Note the log just above the NPE: ``` [2020-05-08 15:44:17,616] pool-1-thread-49 WARN Failed to load committed snapshot, skipping manifest clean-up (org.apache.iceberg.SnapshotProducer:291) ``` I think the problem is that the local FS doesn't provide an atomic rename. From [`man rename(2)`](https://linux.die.net/man/2/rename) that is used by `File.renameTo`: > If newpath already exists it will be atomically replaced So it looks like a rename is "atomic" in that it is replaced "so that there is no point at which another process attempting to access newpath will find it missing". But that's not the atomic rename like HDFS, where rename fails if the destination path exists. A quick test validates this behavior: ``` fs.rename(p1, dest) -> true fs.rename(p2, dest) -> true ``` Also, looking at the exceptions thrown in your log, I see only 2 lines in the commit method in stack traces: ``` org.apache.iceberg.hadoop.HadoopTableOperations.commit(HadoopTableOperations.java:123) org.apache.iceberg.hadoop.HadoopTableOperations.commit(HadoopTableOperations.java:151) ``` The first one is after the check whether base is still current, and the second one is the `exists` check before the rename operation. There are no exceptions thrown from the `rename` operation failing. So it looks like the problem is that a POSIX FS isn't safe to use with `HadoopTableOperations`. Can you try using HDFS instead? (FYI @aokolnychyi) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org