[ 
https://issues.apache.org/jira/browse/HBASE-24541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134122#comment-17134122
 ] 

Wellington Chevreuil commented on HBASE-24541:
----------------------------------------------

Thanks for filling this up, [~catalin.luca]. I have added yourself to the list 
of contributors to HBase project, and assigned this Jira to yourself. In the 
future, you should be able to assign new jiras to yourself.

Regarding the described issue, is it something affecting branch-1 only? Can you 
re-submit this patch as a github pull request for easier reviews? If it's also 
affecting master branch, please open a PR to master branch.

> Add support to run LoadIncrementalHFiles in a distributed manner
> ----------------------------------------------------------------
>
>                 Key: HBASE-24541
>                 URL: https://issues.apache.org/jira/browse/HBASE-24541
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce, Performance
>    Affects Versions: 1.4.0
>            Reporter: Constantin-Catalin Luca
>            Assignee: Constantin-Catalin Luca
>            Priority: Minor
>         Attachments: HBASE_24541-1.4.0.patch
>
>
> LoadIncrementalHFiles takes a very long time to complete when running HBase 
> on top of S3 and attempting to bulkload 500K-700K files.
> The root cause of this is a combination of the higher latency of S3 (as 
> compared to HDFS) as well as the calls made by LoadIncrementalHFiles to the 
> underlying filesystem(each file is opened, seeked to the trailer offset at 
> the end, and then the trailer is read).
> Increasing the parallelism does not yield any significant improvement. This 
> seems to stem from the fact that once the trailer is read the stream is not 
> consumed to the end. This causes the underlying HTTP connection to be aborted 
> and it cannot be re-used.
>  
> The proposed solution would be to also add support to run 
> LoadIncrementalHFiles on multiple machines as a map reduce job. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to