[
https://issues.apache.org/jira/browse/HADOOP-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564518#action_12564518
]
Bryan Duxbury commented on HADOOP-2731:
---------------------------------------
After about 45% of 1 million 10KB rows imported, the import started to slow
down markedly. I did a little DFS digging to get a sense of the size of
mapfiles:
{code}
[EMAIL PROTECTED] hadoop]$ bin/hadoop dfs -lsr / | grep test_table | grep
"mapfiles/[^/]*/data" | grep -v compaction.dir | awk '{print $4}' | sort -n |
awk '{print $1 / 1024 / 1024}'
0
0.589743
21.5422
29.4829
36.4409
36.834
54.6908
56.6071
60.0075
61.7568
64
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.3218
65.3046
68.1251
68.9211
71.2503
73.2158
73.9037
77.5301
82.1786
83.0631
83.1417
88.94
92.9497
98.2762
111.76
112.399
116.162
119.337
127.572
128.496
657.9
760.569
1261.14
1564.22
{code}
(If you can't read awk, that's size in megabytes of each mapfile in the DFS for
my test table).
There's only 7 regions, and the biggest is almost 1.5 GiB. I will report again
when the job has completed and the cluster has had a chance to cool down.
> [hbase] Under load, regions become extremely large and eventually cause
> region servers to become unresponsive
> -------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-2731
> URL: https://issues.apache.org/jira/browse/HADOOP-2731
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: Bryan Duxbury
> Attachments: split-v8.patch, split-v9.patch, split.patch
>
>
> When attempting to write to HBase as fast as possible, HBase accepts puts at
> a reasonably high rate for a while, and then the rate begins to drop off,
> ultimately culminating in exceptions reaching client code. In my testing, I
> was able to write about 370 10KB records a second to HBase until I reach
> around 1 million rows written. At that point, a moderate to large number of
> exceptions - NotServingRegionException, WrongRegionException, region offline,
> etc - begin reaching the client code. This appears to be because the
> retry-and-wait logic in HTable runs out of retries and fails.
> Looking at mapfiles for the regions from the command line shows that some of
> the mapfiles are between 1 and 2 GB in size, much more than the stated file
> size limit. Talking with Stack, one possible explanation for this is that the
> RegionServer is not choosing to compact files often enough, leading to many
> small mapfiles, which in turn leads to a few overlarge mapfiles. Then, when
> the time comes to do a split or "major" compaction, it takes an unexpectedly
> long time to complete these operations. This translates into errors for the
> client application.
> If I back off the import process and give the cluster some quiet time, some
> splits and compactions clearly do take place, because the number of regions
> go up and the number of mapfiles/region goes down. I can then begin writing
> again in earnest for a short period of time until the problem begins again.
> Both Marc Harris and myself have seen this behavior.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.