jams-xin opened a new issue, #14857:
URL: https://github.com/apache/iceberg/issues/14857

   ### Apache Iceberg version
   
   1.4.2
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Hi.
   There are many spark jobs are concurrently writing partition data for one 
table. During the final metadata commit phase (HadoopTableOperations::commit), 
if multiple processes concurrently execute rename file operations, the job may 
be have concurrency security problem, which can cause the overwrite of the 
metadata.json file and the partition data is writen failed, but  the spark jobs 
is successful.
   Relate souce code:
   `
   private void renameToFinal(FileSystem fs, Path src, Path dst, int 
nextVersion) {
       try {
         lockManager.acquire(dst.toString(), src.toString());
         if (fs.exists(dst)) {
           throw new CommitFailedException("Version %d already exists: %s", 
nextVersion, dst);
         }
   
         if (!fs.rename(src, dst)) {
           CommitFailedException cfe =
               new CommitFailedException("Failed to commit changes using 
rename: %s", dst);
           RuntimeException re = tryDelete(src);
           if (re != null) {
             cfe.addSuppressed(re);
           }
           throw cfe;
         }
       } catch (IOException e) {
         CommitFailedException cfe =
             new CommitFailedException(e, "Failed to commit changes using 
rename: %s", dst);
         RuntimeException re = tryDelete(src);
         if (re != null) {
           cfe.addSuppressed(re);
         }
         throw cfe;
       } finally {
         lockManager.release(dst.toString(), src.toString());
       }
     }
   `
   May I ask if anyone has encountered a similar issue and how it was 
resolved,thanks!
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [x] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to