rdblue commented on issue #1014:
URL: 
https://github.com/apache/incubator-iceberg/issues/1014#issuecomment-631684123


   @vanliu-tx, the `UpdateEvent` problem is happening because the snapshot is 
missing. Just after the commit, the snapshot is loaded to clean up uncommitted 
files and to send the update event. Both of those tasks are failing to load the 
committed snapshot. Note the log just above the NPE:
   
   ```
   [2020-05-08 15:44:17,616] pool-1-thread-49 WARN  Failed to load committed 
snapshot, skipping manifest clean-up (org.apache.iceberg.SnapshotProducer:291)
   ```
   
   I think the problem is that the local FS doesn't provide an atomic rename. 
From [`man rename(2)`](https://linux.die.net/man/2/rename) that is used by 
`File.renameTo`:
   
   > If newpath already exists it will be atomically replaced
   
   So it looks like a rename is "atomic" in that it is replaced "so that there 
is no point at which another process attempting to access newpath will find it 
missing". But that's not the atomic rename like HDFS, where rename fails if the 
destination path exists. A quick test validates this behavior:
   
   ```
   fs.rename(p1, dest) -> true
   fs.rename(p2, dest) -> true
   ```
   
   Also, looking at the exceptions thrown in your log, I see only 2 lines in 
the commit method in stack traces:
   ```
   
org.apache.iceberg.hadoop.HadoopTableOperations.commit(HadoopTableOperations.java:123)
   
org.apache.iceberg.hadoop.HadoopTableOperations.commit(HadoopTableOperations.java:151)
   ```
   
   The first one is after the check whether base is still current, and the 
second one is the `exists` check before the rename operation. There are no 
exceptions thrown from the `rename` operation failing.
   
   So it looks like the problem is that a POSIX FS isn't safe to use with 
`HadoopTableOperations`. Can you try using HDFS instead?
   
   (FYI @aokolnychyi)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to