HeartSaVioR commented on pull request #1559:
URL: https://github.com/apache/iceberg/pull/1559#issuecomment-708376836
To clarify, the API I mentioned was `FileContext.rename(src, dst, options)`,
not `FileSystem.rename(src, dst)`, which the filesystem doc documents.
I don't have HDFS cluster now, but the result of the rename operation
against local filesystem via FileContext is quite different from the document
says.
Below is the code you can run with spark-shell against Hadoop 2.7 & Hadoop
3.2.
```
import org.apache.hadoop.fs.{FileContext, Path}
import org.apache.hadoop.fs.Options.Rename
import org.apache.hadoop.fs.permission.FsPermission
val context = FileContext.getFileContext()
// assuming you have files `unit-tests.log` and `unit-tests-succeed.log`
(different file size) in /tmp
val setupPath = new Path("/tmp/unit-tests-succeed.log")
val anotherFilePath = new Path("/tmp/unit-tests.log")
val sourceDirPath = new Path("/tmp/rename-experiment-src")
val destDirPath = new Path("/tmp/rename-experiment-dst")
val anotherFileSourcePath = new
Path("/tmp/rename-experiment-src/unit-tests.log")
val sourcePath = new
Path("/tmp/rename-experiment-src/unit-tests-succeed.log")
val destPath = new Path("/tmp/rename-experiment-dst/unit-tests-succeed.log")
// remove directories
context.delete(sourceDirPath, true)
context.delete(destDirPath, true)
context.mkdir(sourceDirPath, FsPermission.getDirDefault(), true)
context.mkdir(destDirPath, FsPermission.getDirDefault(), true)
// setup file
context.util.copy(setupPath, sourcePath)
// the file got moved
context.rename(sourcePath, destPath)
// check whether the file is moved
println(s"src path: ${context.util.exists(sourcePath)}")
println(s"dest path: ${context.util.exists(destPath)}")
println(s"content summary on dest path:
${context.util.getContentSummary(destPath)}")
// re-setup file
context.util.copy(setupPath, sourcePath)
// re-rename -> this will throw exception as file already exists
context.rename(sourcePath, destPath)
// setup another file
context.util.copy(anotherFilePath, anotherFileSourcePath)
// re-rename with overwrite option -> this will not throw exception
context.rename(anotherFileSourcePath, destPath, Rename.OVERWRITE)
// check whether the file is moved
println(s"src path: ${context.util.exists(anotherFileSourcePath)}")
println(s"dest path: ${context.util.exists(destPath)}")
println(s"content summary on dest path:
${context.util.getContentSummary(destPath)}")
```
It correctly fails on existing file in destination, and correctly overwrites
the new file if the overwrite option is provided.
I also looked through the code path on how namenode handles rename, and it
redirects me to `DistributedFileSystem.rename` which javadoc says it guarantees
atomicity.
> rel/release-2.7.4
https://github.com/apache/hadoop/blob/cd915e1e8d9d0131462a0b7301586c175728a282/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L647-L653
> rel/release-3.2.0
https://github.com/apache/hadoop/blob/e97acb3bd8f3befd27418996fa5d4b50bf2e17bf/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L892-L898
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]