[
https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428975#comment-15428975
]
Chen He edited comment on HADOOP-9565 at 8/19/16 10:52 PM:
-----------------------------------------------------------
>From our experiences, the main renaming overhead comes from
>"FileOutputCommitter.commitTask()". Because it moves the files from temp dir
>to dest dir. Some frameworks may not care whether the final task files are
>under "dst/_temporary/0/_temporary/" or "dst/". Why don't we add a parameter
>such as "mapreduce.skip.task.commit" parameter (default is false), so that
>once a task is done, the output just stay in "dst/_temporary/0/_temporary/".
>Then, the next job or application just need to take the "dst/" as input dir,
>they do not care about whether is is deep or not. It avoids the atomicwrite
>issue, provide compatibility, and avoid rename overhead. If there is no
>objection, I am happy to create a JIRA to tracking that.
was (Author: airbots):
>From our experiences, the main renaming overhead comes from
>"FileOutputCommitter.commitTask()". Because it moves the files from temp dir
>to dest dir. Some frameworks may not care whether the final task files are
>under "dst/_temporary/0/_temporary/" or "dst/". Why don't we add a parameter
>such as "mapreduce.skip.task.commit" parameter (default is false), so that
>once a task is done, the output just stay in "dst/_temporary/0/_temporary/".
>Then, the next job or application just need to take the "dst/" as input dir,
>they do not care about whether is is deep or not. It avoids the atomicwrite
>issue, provide compatibility, and avoid rename overhead. If there is no
>objection, I will create a JIRA to tracking that.
> Add a Blobstore interface to add to blobstore FileSystems
> ---------------------------------------------------------
>
> Key: HADOOP-9565
> URL: https://issues.apache.org/jira/browse/HADOOP-9565
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs, fs/s3, fs/swift
> Affects Versions: 2.6.0
> Reporter: Steve Loughran
> Assignee: Pieter Reuse
> Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch,
> HADOOP-9565-003.patch, HADOOP-9565-004.patch, HADOOP-9565-005.patch,
> HADOOP-9565-006.patch, HADOOP-9565-branch-2-007.patch
>
>
> We can make the fact that some {{FileSystem}} implementations are really
> blobstores, with different atomicity and consistency guarantees, by adding a
> {{Blobstore}} interface to add to them.
> This could also be a place to add a {{Copy(Path,Path)}} method, assuming that
> all blobstores implement at server-side copy operation as a substitute for
> rename.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]