[
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420446#comment-16420446
]
Hudson commented on HADOOP-14999:
---------------------------------
SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13906 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/13906/])
HADOOP-14999. AliyunOSS: provide one asynchronous multi-part based (sammi.chen:
rev 6542d17ea460ec222137c4b275b13daf15d3fca3)
* (add)
hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSBlockOutputStream.java
* (edit)
hadoop-tools/hadoop-aliyun/src/test/java/org/apache/hadoop/fs/aliyun/oss/contract/TestAliyunOSSContractDistCp.java
* (edit)
hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSFileSystemStore.java
* (delete)
hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSOutputStream.java
* (edit)
hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/Constants.java
* (add)
hadoop-tools/hadoop-aliyun/src/test/java/org/apache/hadoop/fs/aliyun/oss/TestAliyunOSSBlockOutputStream.java
* (edit)
hadoop-tools/hadoop-aliyun/src/test/java/org/apache/hadoop/fs/aliyun/oss/TestAliyunOSSInputStream.java
* (edit)
hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunCredentialsProvider.java
* (delete)
hadoop-tools/hadoop-aliyun/src/test/java/org/apache/hadoop/fs/aliyun/oss/TestAliyunOSSOutputStream.java
* (edit)
hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSFileSystem.java
* (edit)
hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSUtils.java
> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> ------------------------------------------------------------------------
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/oss
> Affects Versions: 3.0.0-beta1
> Reporter: Genmao Yu
> Assignee: Genmao Yu
> Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch,
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch,
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch,
> HADOOP-14999.009.patch, HADOOP-14999.010.patch, HADOOP-14999.011.patch,
> asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
> - improve the performance of uploading file to OSS server. Firstly, this
> mechanism splits result to multiple small blocks and upload them in parallel.
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme
> example, there is a task which will output 100GB or even larger, we may need
> to output this 100GB to local disk and then upload it. Sometimes, it is
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference
> between previous {{AliyunOSSOutputStream}} and
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local
> disk before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload result
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks,
> i.e. some small local file, and each block will be packaged into a uploading
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of
> those tasks failed, the whole file uploading will failed, and we will abort
> current uploading.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]