[ 
https://issues.apache.org/jira/browse/HADOOP-19047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813949#comment-17813949
 ] 

ASF GitHub Bot commented on HADOOP-19047:
-----------------------------------------

steveloughran commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1925340023

   I've thought about this some more. Here are some things which I believe we 
need
   
   1. Marker files at the end of each path so that spark status reporting on 
different processes can get an update on an active job.
   1. A way to abort all uploads of a failed task attempt -even from a 
different process. Probably also a way to abort the entire job.
   1. Confidence that the inner memory store of pending uploads Will not grow 
it definitely.
   
   Ignoring item number #3 for now, remember that we have #1 solved by adding a 
0 byte marker with a header of "final length"; spark has some special handling 
zero byte files to use getXattr() and fall back to the probe for this -at the 
expense of a second HEAD request. Generating a modified FileStatus response 
from a single HEAD/getObjectMetadata() call Wood actually eliminate the need 
for that I wish I'd thought of it myself. Yes, we do break that guarantee that 
files listed are the same size as the files opened… but magic paths are, well, 
magic. We break a lot of guarantees there already.
   
   The existing design should be retained even in memory; the calculation of 
final length something which can be done for all.
   
   But: we do not need to save the .pending files just for task abort. All we 
need to do is be able to enumerate the upload IDs of all the files from that 
task attempt and cancel them. We can do that just by adding another header to 
the marker file. Task committee uses the memory data; task abort will need a 
deep scan of the task attempt, and all zero bite files with the proposed new 
header used to initiate water operations. This is only for task board an 
outlier case. For normal task commit there is no need to Scan the directory 
pause the pending files then generate a new pending set file for later pause 
commit. It is probably the Jason on the marshalling which is as much a 
performance killer here as the listing operation.
   
   What do you think?




> Support InMemory Tracking Of S3A Magic Commits
> ----------------------------------------------
>
>                 Key: HADOOP-19047
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19047
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>            Reporter: Syed Shameerur Rahman
>            Assignee: Syed Shameerur Rahman
>            Priority: Major
>              Labels: pull-request-available
>
> The following are the operations which happens within a Task when it uses S3A 
> Magic Committer. 
> *During closing of stream*
> 1. A 0-byte file with a same name of the original file is uploaded to S3 
> using PUT operation. Refer 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L152]
>  for more information. This is done so that the downstream application like 
> Spark could get the size of the file which is being written.
> 2. MultiPartUpload(MPU) metadata is uploaded to S3. Refer 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L176]
>  for more information.
> *During TaskCommit*
> 1. All the MPU metadata which the task wrote to S3 (There will be 'x' number 
> of metadata file in S3 if a single task writes to 'x' files) are read and 
> rewritten to S3 as a single metadata file. Refer 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L201]
>  for more information
> Since these operations happens with the Task JVM, We could optimize as well 
> as save cost by storing these information in memory when Task memory usage is 
> not a constraint. Hence the proposal here is to introduce a new MagicCommit 
> Tracker called "InMemoryMagicCommitTracker" which will store the 
> 1. Metadata of MPU in memory till the Task is committed
> 2. Store the size of the file which can be used by the downstream application 
> to get the file size before it is committed/visible to the output path.
> This optimization will save 2 PUT S3 calls, 1 LIST S3 call, and 1 GET S3 call 
> given a Task writes only 1 file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to