[ 
https://issues.apache.org/jira/browse/HADOOP-17139?focusedWorklogId=630730&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630730
 ]

ASF GitHub Bot logged work on HADOOP-17139:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Jul/21 19:41
            Start Date: 28/Jul/21 19:41
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on a change in pull request 
#3101:
URL: https://github.com/apache/hadoop/pull/3101#discussion_r678596071



##########
File path: 
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md
##########
@@ -1427,57 +1427,91 @@ set to TRUE. If destination already exists, and the 
destination contents must be
 then `overwrite` flag must be set to TRUE.
 
 #### Preconditions
+Source and destination must be different
+```python
+if src = dest : raise FileExistsException
+```
 
-The source file or directory must exist:
+Destination and source must not be descendants one another
+```python
+if isDescendant(src, dest) or isDescendant(dest, src) : TODO
+```
 
-    if not exists(FS, src) : raise FileNotFoundException
+The source file or directory must exist locally:
+```python
+if not exists(LocalFS, src) : raise FileNotFoundException
+```
 
 Directories cannot be copied into files regardless to what the overwrite flag 
is set to:
 
-    if isDir(FS, src) and isFile(FS, dst) : raise PathExistsException
+```python
+if isDir(LocalFS, src) and isFile(FS, dst) : raise PathExistsException
+```
 
 For all cases, except the one for which the above precondition throws, the 
overwrite flag must be
-set to TRUE for the operation to succeed. This will also overwrite any files / 
directories at the
-destination:
-
-    if exists(FS, dst) && not overwrite : raise PathExistsException
-
-#### Postconditions
-Copying a file into an existing directory at destination with a non-existing 
file at destination
-
-    if isFile(fs, src) and not exists(FS, dst) => success
-
-Copying a file into an existing directory at destination with an existing file 
at destination and
-overwrite set to TRUE
-
-    if isFile(FS, src) and overwrite and exists(FS, dst) => success
-
+set to TRUE for the operation to succeed if destination exists. This will also 
overwrite any files
+ / directories at the destination:
 
-Copying a file into a non-existent directory. POSIX file systems would fail 
this operation, HDFS
-allows this to happen creating all the directories in the destination path.
-
-    if isFile(FS, src) and not exists(FS, parent(dst)) => success
-
-Copying directory into destination directory - last part of the destination 
path doesn't exist e.g.
-`/src/bar/ -> /dst/foo/ => /dst/foo/` with the precondition that `/dst/` 
exists but `/dst/foo/`
-doesn't:
-
-    if isDir(FS, src) and not exists(FS, dst) => success
-
-Copying directory into destination directory - last part of the destination 
path exists e.g.
-`/src/bar/ -> /dst/foo/ => /dst/foo/bar/` with the precondition that 
`/dst/foo/` exists but
-`/dst/foo/bar/` doesn't:
-
-    if isDir(FS, src) and exists(FS, dst) => success
+```python
+if exists(FS, dst) and not overwrite : raise PathExistsException
+```
 
-Copying a directory into a destination directory - last part of destination 
path and source directory
-name exist e.g. `/src/foo/ -> /dst/` with the precondition that `/dst/foo/` 
exists. This operation
-will only succeed if the overwrite flag is set to TRUE
+#### Determining the final name of the copy
+Given a base path on the source `base` and a child path `child` where `base` 
is in
+`ancestors(child) + child`:
 
-    if isDir(FS, src) and exists(FS, dst) and overwrite => success
+```python
+def final_name(base, child, dest):
+    is base = child:
+        return dest
+    else:
+        return dest + childElements(base, child)
+```
 
-For all operations if the `delSrc` flag is set to TRUE then the source will be 
deleted. If source
-is a directory then it will be recursively deleted.
+#### Outcome where source is a file `isFile(LocalFS, src)`
+For a file, data at destination becomes that of the source. All ancestors are 
directories.
+```python
+if isFile(LocalFS, src) and (not exists(FS, dest) or (exists(FS, dest) and 
overwrite)):
+    FS' = FS where:
+        FS'.Files[dest] = LocalFS.Files[src]
+        FS'.Directories = FS.Directories + ancestors(FS, dest)
+    LocalFS' = LocalFS where
+        not delSrc or (delSrc = true and delete(LocalFS, src, false))
+else if isFile(LocalFS, src) and isDir(FS, dest):
+    FS' = FS where:
+        let d = final_name(src, dest)
+        FS'.Files[d] = LocalFS.Files[src]
+    LocalFS' = LocalFS where:
+        not delSrc or (delSrc = true and delete(LocalFS, src, false))
+```
+There are no expectations that the file changes are atomic for both local 
`LocalFS` and remote `FS`.
+
+#### Outcome where source is a directory `isDir(LocalFS, src)`
+```python
+if is Dir(LocalFS, src) and (isFile(FS, dest) or isFile(FS, dest + 
childElements(src))):
+    raise FileAlreadyExistsException
+else if isDir(LocalFS, src):
+    dest' = dest

Review comment:
       move to the else of the `if` below so it can be final

##########
File path: 
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md
##########
@@ -1427,57 +1427,91 @@ set to TRUE. If destination already exists, and the 
destination contents must be
 then `overwrite` flag must be set to TRUE.
 
 #### Preconditions
+Source and destination must be different
+```python
+if src = dest : raise FileExistsException
+```
 
-The source file or directory must exist:
+Destination and source must not be descendants one another
+```python
+if isDescendant(src, dest) or isDescendant(dest, src) : TODO
+```
 
-    if not exists(FS, src) : raise FileNotFoundException
+The source file or directory must exist locally:
+```python
+if not exists(LocalFS, src) : raise FileNotFoundException
+```
 
 Directories cannot be copied into files regardless to what the overwrite flag 
is set to:
 
-    if isDir(FS, src) and isFile(FS, dst) : raise PathExistsException
+```python
+if isDir(LocalFS, src) and isFile(FS, dst) : raise PathExistsException
+```
 
 For all cases, except the one for which the above precondition throws, the 
overwrite flag must be
-set to TRUE for the operation to succeed. This will also overwrite any files / 
directories at the
-destination:
-
-    if exists(FS, dst) && not overwrite : raise PathExistsException
-
-#### Postconditions
-Copying a file into an existing directory at destination with a non-existing 
file at destination
-
-    if isFile(fs, src) and not exists(FS, dst) => success
-
-Copying a file into an existing directory at destination with an existing file 
at destination and
-overwrite set to TRUE
-
-    if isFile(FS, src) and overwrite and exists(FS, dst) => success
-
+set to TRUE for the operation to succeed if destination exists. This will also 
overwrite any files
+ / directories at the destination:
 
-Copying a file into a non-existent directory. POSIX file systems would fail 
this operation, HDFS
-allows this to happen creating all the directories in the destination path.
-
-    if isFile(FS, src) and not exists(FS, parent(dst)) => success
-
-Copying directory into destination directory - last part of the destination 
path doesn't exist e.g.
-`/src/bar/ -> /dst/foo/ => /dst/foo/` with the precondition that `/dst/` 
exists but `/dst/foo/`
-doesn't:
-
-    if isDir(FS, src) and not exists(FS, dst) => success
-
-Copying directory into destination directory - last part of the destination 
path exists e.g.
-`/src/bar/ -> /dst/foo/ => /dst/foo/bar/` with the precondition that 
`/dst/foo/` exists but
-`/dst/foo/bar/` doesn't:
-
-    if isDir(FS, src) and exists(FS, dst) => success
+```python
+if exists(FS, dst) and not overwrite : raise PathExistsException
+```
 
-Copying a directory into a destination directory - last part of destination 
path and source directory
-name exist e.g. `/src/foo/ -> /dst/` with the precondition that `/dst/foo/` 
exists. This operation
-will only succeed if the overwrite flag is set to TRUE
+#### Determining the final name of the copy
+Given a base path on the source `base` and a child path `child` where `base` 
is in
+`ancestors(child) + child`:
 
-    if isDir(FS, src) and exists(FS, dst) and overwrite => success
+```python
+def final_name(base, child, dest):
+    is base = child:
+        return dest
+    else:
+        return dest + childElements(base, child)
+```
 
-For all operations if the `delSrc` flag is set to TRUE then the source will be 
deleted. If source
-is a directory then it will be recursively deleted.
+#### Outcome where source is a file `isFile(LocalFS, src)`
+For a file, data at destination becomes that of the source. All ancestors are 
directories.
+```python
+if isFile(LocalFS, src) and (not exists(FS, dest) or (exists(FS, dest) and 
overwrite)):
+    FS' = FS where:
+        FS'.Files[dest] = LocalFS.Files[src]
+        FS'.Directories = FS.Directories + ancestors(FS, dest)
+    LocalFS' = LocalFS where
+        not delSrc or (delSrc = true and delete(LocalFS, src, false))
+else if isFile(LocalFS, src) and isDir(FS, dest):
+    FS' = FS where:
+        let d = final_name(src, dest)
+        FS'.Files[d] = LocalFS.Files[src]
+    LocalFS' = LocalFS where:
+        not delSrc or (delSrc = true and delete(LocalFS, src, false))
+```
+There are no expectations that the file changes are atomic for both local 
`LocalFS` and remote `FS`.
+
+#### Outcome where source is a directory `isDir(LocalFS, src)`
+```python
+if is Dir(LocalFS, src) and (isFile(FS, dest) or isFile(FS, dest + 
childElements(src))):

Review comment:
       `isDir()`

##########
File path: 
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md
##########
@@ -1427,57 +1427,91 @@ set to TRUE. If destination already exists, and the 
destination contents must be
 then `overwrite` flag must be set to TRUE.
 
 #### Preconditions
+Source and destination must be different
+```python
+if src = dest : raise FileExistsException
+```
 
-The source file or directory must exist:
+Destination and source must not be descendants one another
+```python
+if isDescendant(src, dest) or isDescendant(dest, src) : TODO

Review comment:
       `raise IOException`

##########
File path: 
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md
##########
@@ -1427,57 +1427,91 @@ set to TRUE. If destination already exists, and the 
destination contents must be
 then `overwrite` flag must be set to TRUE.
 
 #### Preconditions
+Source and destination must be different
+```python
+if src = dest : raise FileExistsException
+```
 
-The source file or directory must exist:
+Destination and source must not be descendants one another
+```python
+if isDescendant(src, dest) or isDescendant(dest, src) : TODO
+```
 
-    if not exists(FS, src) : raise FileNotFoundException
+The source file or directory must exist locally:
+```python
+if not exists(LocalFS, src) : raise FileNotFoundException
+```
 
 Directories cannot be copied into files regardless to what the overwrite flag 
is set to:
 
-    if isDir(FS, src) and isFile(FS, dst) : raise PathExistsException
+```python
+if isDir(LocalFS, src) and isFile(FS, dst) : raise PathExistsException
+```
 
 For all cases, except the one for which the above precondition throws, the 
overwrite flag must be
-set to TRUE for the operation to succeed. This will also overwrite any files / 
directories at the
-destination:
-
-    if exists(FS, dst) && not overwrite : raise PathExistsException
-
-#### Postconditions
-Copying a file into an existing directory at destination with a non-existing 
file at destination
-
-    if isFile(fs, src) and not exists(FS, dst) => success
-
-Copying a file into an existing directory at destination with an existing file 
at destination and
-overwrite set to TRUE
-
-    if isFile(FS, src) and overwrite and exists(FS, dst) => success
-
+set to TRUE for the operation to succeed if destination exists. This will also 
overwrite any files
+ / directories at the destination:
 
-Copying a file into a non-existent directory. POSIX file systems would fail 
this operation, HDFS
-allows this to happen creating all the directories in the destination path.
-
-    if isFile(FS, src) and not exists(FS, parent(dst)) => success
-
-Copying directory into destination directory - last part of the destination 
path doesn't exist e.g.
-`/src/bar/ -> /dst/foo/ => /dst/foo/` with the precondition that `/dst/` 
exists but `/dst/foo/`
-doesn't:
-
-    if isDir(FS, src) and not exists(FS, dst) => success
-
-Copying directory into destination directory - last part of the destination 
path exists e.g.
-`/src/bar/ -> /dst/foo/ => /dst/foo/bar/` with the precondition that 
`/dst/foo/` exists but
-`/dst/foo/bar/` doesn't:
-
-    if isDir(FS, src) and exists(FS, dst) => success
+```python
+if exists(FS, dst) and not overwrite : raise PathExistsException
+```
 
-Copying a directory into a destination directory - last part of destination 
path and source directory
-name exist e.g. `/src/foo/ -> /dst/` with the precondition that `/dst/foo/` 
exists. This operation
-will only succeed if the overwrite flag is set to TRUE
+#### Determining the final name of the copy
+Given a base path on the source `base` and a child path `child` where `base` 
is in
+`ancestors(child) + child`:
 
-    if isDir(FS, src) and exists(FS, dst) and overwrite => success
+```python
+def final_name(base, child, dest):
+    is base = child:
+        return dest
+    else:
+        return dest + childElements(base, child)
+```
 
-For all operations if the `delSrc` flag is set to TRUE then the source will be 
deleted. If source
-is a directory then it will be recursively deleted.
+#### Outcome where source is a file `isFile(LocalFS, src)`
+For a file, data at destination becomes that of the source. All ancestors are 
directories.
+```python
+if isFile(LocalFS, src) and (not exists(FS, dest) or (exists(FS, dest) and 
overwrite)):
+    FS' = FS where:
+        FS'.Files[dest] = LocalFS.Files[src]
+        FS'.Directories = FS.Directories + ancestors(FS, dest)
+    LocalFS' = LocalFS where
+        not delSrc or (delSrc = true and delete(LocalFS, src, false))
+else if isFile(LocalFS, src) and isDir(FS, dest):
+    FS' = FS where:
+        let d = final_name(src, dest)
+        FS'.Files[d] = LocalFS.Files[src]
+    LocalFS' = LocalFS where:
+        not delSrc or (delSrc = true and delete(LocalFS, src, false))
+```
+There are no expectations that the file changes are atomic for both local 
`LocalFS` and remote `FS`.
+
+#### Outcome where source is a directory `isDir(LocalFS, src)`
+```python
+if is Dir(LocalFS, src) and (isFile(FS, dest) or isFile(FS, dest + 
childElements(src))):
+    raise FileAlreadyExistsException
+else if isDir(LocalFS, src):
+    dest' = dest
+    if exists(FS, dest)
+        dest' = dest + childElements(src)
+        if exists(FS, dest') and not overwrite:
+            raise PathExistsException
+
+    FS' = FS where:
+        forall c in descendants(LocalFS, src):
+            not exists(FS', final_name(c)) or overwrite
+        and forall c in descendants(LocalFS, src) where isDir(LocalFS, c):
+            FS'.Directories = FS'.Directories + (dest' + childElements(src, c))
+        and forall c in descendants(LocalFS, src) where isFile(LocalFS, c):
+            FS'.Files[final_name(c, dest')] = LocalFS.Files[c]
+    LocalFS' = LocalFS where
+        not delSrc or (delSrc = true and delete(LocalFS, src, true))
+```
+There are no expectations of operation isolation / atomicity. This means files 
can change
+in source or destination while the operation is executing. No guarantees are 
made for the

Review comment:
       nit: split these each to a new line




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 630730)
    Time Spent: 8h 40m  (was: 8.5h)

> Re-enable optimized copyFromLocal implementation in S3AFileSystem
> -----------------------------------------------------------------
>
>                 Key: HADOOP-17139
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17139
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.3.0, 3.2.1
>            Reporter: Sahil Takiar
>            Assignee: Bogdan Stolojan
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> It looks like HADOOP-15932 disabled the optimized copyFromLocal 
> implementation in S3A for correctness reasons.  innerCopyFromLocalFile should 
> be fixed and re-enabled. The current implementation uses 
> FileSystem.copyFromLocal which will open an input stream from the local fs 
> and an output stream to the destination fs, and then call IOUtils.copyBytes. 
> With default configs, this will cause S3A to read the file into memory, write 
> it back to a file on the local fs, and then when the file is closed, upload 
> it to S3.
> The optimized version of copyFromLocal in innerCopyFromLocalFile, directly 
> creates a PutObjectRequest request with the local file as the input.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to