[jira] [Commented] (HBASE-13034) Importing rows with bulkupload can overload single regionservers

Bryant Khau (JIRA) Fri, 13 Feb 2015 18:08:00 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321145#comment-14321145
 ]


Bryant Khau commented on HBASE-13034:
-------------------------------------

I'm using normal online mode.

Importing 3.5 tb of data, 100 regions presplit, 4 region servers, each "part" 
that the export tool generated was about 17gb with 200 parts. Each region 
server would be hammered by requests for about 15 minutes before the import 
moved to the next region to hit another region server. This severely limits the 
import speed since it is bottlenecked by as much as one region server can 
handle 

> Importing rows with bulkupload can overload single regionservers
> ----------------------------------------------------------------
>
>                 Key: HBASE-13034
>                 URL: https://issues.apache.org/jira/browse/HBASE-13034
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 0.98.0
>            Reporter: Bryant Khau
>            Priority: Minor
>
> Exporting a table with a common schema, like hashes as the key, will result 
> in a sorted exported file. When imported with 
> org.apache.hadoop.hbase.mapreduce.Import, region servers can be overloaded 
> one by one by requests by the MapReduce job, since the rows are imported in 
> sequential order, and a regions span ranges in sequential order. This is more 
> likely to happen with lots of data and not a lot of regions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13034) Importing rows with bulkupload can overload single regionservers

Reply via email to