[ 
https://issues.apache.org/jira/browse/HADOOP-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564527#action_12564527
 ] 

Bryan Duxbury commented on HADOOP-2731:
---------------------------------------

I may have spoken too soon. After a bit of a slowdown around 45%, some splits 
burst through and the writing rate increased back to what I expected it would 
be.

Right at the end of the job, I still have a bunch of big mapfiles:

{code}
[EMAIL PROTECTED] hadoop]$ bin/hadoop dfs -lsr / | grep test_table | grep 
"mapfiles/[^/]*/data" | grep -v compaction.dir | awk '{print $4}' | sort -n | 
awk '{print $1 / 1024 / 1024}'
18.6529
18.987
20.3924
25.5912
30.4755
32.5393
57.0985
60.0075
60.2728
61.7568
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2235
64.2235
64.2432
64.2432
69.8449
75.3975
76.5179
77.766
79.3581
81.8543
82.6503
83.0631
88.94
90.8564
92.5664
97.2247
101.814
104.703
105.116
110.62
113.814
127.543
128.427
128.427
128.427
128.427
128.516
353.175
367.907
471.401
474.664
575.348
657.9
906.067
921.349
1578.89
{code}

25 minutes after, I've had a few more splits, getting me up to 23 regions 
overall, with only 40 mapfiles. Some of the files are still much larger than 
they should be.

I definitely see this having been an improvement. I don't think it's the whole 
way yet.

> [hbase] Under load, regions become extremely large and eventually cause 
> region servers to become unresponsive
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2731
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: Bryan Duxbury
>         Attachments: split-v8.patch, split-v9.patch, split.patch
>
>
> When attempting to write to HBase as fast as possible, HBase accepts puts at 
> a reasonably high rate for a while, and then the rate begins to drop off, 
> ultimately culminating in exceptions reaching client code. In my testing, I 
> was able to write about 370 10KB records a second to HBase until I reach 
> around 1 million rows written. At that point, a moderate to large number of 
> exceptions - NotServingRegionException, WrongRegionException, region offline, 
> etc - begin reaching the client code. This appears to be because the 
> retry-and-wait logic in HTable runs out of retries and fails. 
> Looking at mapfiles for the regions from the command line shows that some of 
> the mapfiles are between 1 and 2 GB in size, much more than the stated file 
> size limit. Talking with Stack, one possible explanation for this is that the 
> RegionServer is not choosing to compact files often enough, leading to many 
> small mapfiles, which in turn leads to a few overlarge mapfiles. Then, when 
> the time comes to do a split or "major" compaction, it takes an unexpectedly 
> long time to complete these operations. This translates into errors for the 
> client application.
> If I back off the import process and give the cluster some quiet time, some 
> splits and compactions clearly do take place, because the number of regions 
> go up and the number of mapfiles/region goes down. I can then begin writing 
> again in earnest for a short period of time until the problem begins again.
> Both Marc Harris and myself have seen this behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to