HeartSaVioR commented on pull request #1559:
URL: https://github.com/apache/iceberg/pull/1559#issuecomment-708376836


   To clarify, the API I mentioned was `FileContext.rename(src, dst, options)`, 
not `FileSystem.rename(src, dst)`, which the filesystem doc documents.
   
   I don't have HDFS cluster now, but the result of the rename operation 
against local filesystem via FileContext is quite different from the document 
says.
   
   Below is the code you can run with spark-shell against Hadoop 2.7 & Hadoop 
3.2.
   
   ```
   
   import org.apache.hadoop.fs.{FileContext, Path}
   import org.apache.hadoop.fs.Options.Rename
   import org.apache.hadoop.fs.permission.FsPermission
   
   val context = FileContext.getFileContext()
   
   // assuming you have files `unit-tests.log` and `unit-tests-succeed.log` 
(different file size) in /tmp
   
   val setupPath = new Path("/tmp/unit-tests-succeed.log")
   val anotherFilePath = new Path("/tmp/unit-tests.log")
   val sourceDirPath = new Path("/tmp/rename-experiment-src")
   val destDirPath = new Path("/tmp/rename-experiment-dst")
   val anotherFileSourcePath = new 
Path("/tmp/rename-experiment-src/unit-tests.log")
   val sourcePath = new 
Path("/tmp/rename-experiment-src/unit-tests-succeed.log")
   val destPath = new Path("/tmp/rename-experiment-dst/unit-tests-succeed.log")
   
   // remove directories
   context.delete(sourceDirPath, true)
   context.delete(destDirPath, true)
   context.mkdir(sourceDirPath, FsPermission.getDirDefault(), true)
   context.mkdir(destDirPath, FsPermission.getDirDefault(), true)
   
   // setup file
   context.util.copy(setupPath, sourcePath)
   
   // the file got moved
   context.rename(sourcePath, destPath)
   
   // check whether the file is moved
   println(s"src path: ${context.util.exists(sourcePath)}")
   println(s"dest path: ${context.util.exists(destPath)}")
   println(s"content summary on dest path: 
${context.util.getContentSummary(destPath)}")
   
   // re-setup file
   context.util.copy(setupPath, sourcePath)
   
   // re-rename -> this will throw exception as file already exists
   context.rename(sourcePath, destPath)
   
   // setup another file
   context.util.copy(anotherFilePath, anotherFileSourcePath)
   
   // re-rename with overwrite option -> this will not throw exception
   context.rename(anotherFileSourcePath, destPath, Rename.OVERWRITE)
   
   // check whether the file is moved
   println(s"src path: ${context.util.exists(anotherFileSourcePath)}")
   println(s"dest path: ${context.util.exists(destPath)}")
   println(s"content summary on dest path: 
${context.util.getContentSummary(destPath)}")
   ```
   
   It correctly fails on existing file in destination, and correctly overwrites 
the new file if the overwrite option is provided.
   
   I also looked through the code path on how namenode handles rename, and it 
redirects me to `DistributedFileSystem.rename` which javadoc says it guarantees 
atomicity.
   
   > rel/release-2.7.4
   
   
https://github.com/apache/hadoop/blob/cd915e1e8d9d0131462a0b7301586c175728a282/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L647-L653
   
   > rel/release-3.2.0
   
   
https://github.com/apache/hadoop/blob/e97acb3bd8f3befd27418996fa5d4b50bf2e17bf/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L892-L898
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to