[
https://issues.apache.org/jira/browse/HBASE-24541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148758#comment-17148758
]
Constantin-Catalin Luca commented on HBASE-24541:
-------------------------------------------------
Hi [~wchevreuil]. Thanks for adding me to the list of contributors.
I have submitted a [pull request|https://github.com/apache/hbase/pull/2002] .
I ran into this issue while working with a deployment of branch-1, but master
can also benefit from this optimisation.
However the bulkload code has been changed quite a lot between 1.4.0 and the
latest version of master, and the patch cannot be easily forward/back ported. I
can submit a separate pull request against master if needed.
> Add support to run LoadIncrementalHFiles in a distributed manner
> ----------------------------------------------------------------
>
> Key: HBASE-24541
> URL: https://issues.apache.org/jira/browse/HBASE-24541
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce, Performance
> Affects Versions: 1.4.0
> Reporter: Constantin-Catalin Luca
> Assignee: Constantin-Catalin Luca
> Priority: Minor
> Attachments: HBASE_24541-1.4.0.patch
>
>
> LoadIncrementalHFiles takes a very long time to complete when running HBase
> on top of S3 and attempting to bulkload 500K-700K files.
> The root cause of this is a combination of the higher latency of S3 (as
> compared to HDFS) as well as the calls made by LoadIncrementalHFiles to the
> underlying filesystem(each file is opened, seeked to the trailer offset at
> the end, and then the trailer is read).
> Increasing the parallelism does not yield any significant improvement. This
> seems to stem from the fact that once the trailer is read the stream is not
> consumed to the end. This causes the underlying HTTP connection to be aborted
> and it cannot be re-used.
>
> The proposed solution would be to also add support to run
> LoadIncrementalHFiles on multiple machines as a map reduce job.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)