[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

Genmao Yu (JIRA) Sat, 13 Jan 2018 21:13:21 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Genmao Yu updated HADOOP-14999:
-------------------------------
    Description: 
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and this 

  was:
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-14999
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14999
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/oss
>    Affects Versions: 3.0.0-beta1
>            Reporter: Genmao Yu
>            Assignee: Genmao Yu
>         Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and this 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

Reply via email to