[
https://issues.apache.org/jira/browse/HBASE-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896937#comment-13896937
]
Jerry He commented on HBASE-10492:
----------------------------------
The problem is probably caused by this part of the code in
SplitTransaction.openDaughters()
{code}
// Open daughters in parallel.
DaughterOpener aOpener = new DaughterOpener(server, a);
DaughterOpener bOpener = new DaughterOpener(server, b);
aOpener.start();
bOpener.start();
try {
aOpener.join();
bOpener.join();
}
{code}
We are opening the daughter regions in separate new threads. It is possible,
although rare, due to issues like thread scheduling, the daughter regions may
not be opened until after a long time.
> open daughter regions can unpredictably take long time
> ------------------------------------------------------
>
> Key: HBASE-10492
> URL: https://issues.apache.org/jira/browse/HBASE-10492
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.96.0
> Reporter: Jerry He
>
> I have seen during a stress testing client was getting
> "RetriesExhaustedWithDetailsException: Failed 748 actions:
> NotServingRegionException"
> On the master log, 2014-02-08 20:43 is the timestamp from OFFLINE to
> SPLITTING_NEW, 2014-02-08 21:41 is the timestamp from SPLITTING_NEW to OPEN.
> The corresponding time period on the region sever log is:
> {code}
> 2014-02-08 20:44:12,662 WARN
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not
> found for region: 010c1981882d1a59201af5e2dc589d44
> 2014-02-08 20:44:12,666 WARN
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not
> found for region: c2eb9b7971ca7f3fed3da86df5b788e7
> {code}
> There were no INFO related to these two regions until: (at the end see this:
> Split took 57mins, 16sec)
> {code}
> 2014-02-08 21:41:14,029 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Onlined c2eb9b7971ca7f3fed3da86df5b788e7; next sequenceid=213355
> 2014-02-08 21:41:14,031 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Onlined 010c1981882d1a59201af5e2dc589d44; next sequenceid=213354
> 2014-02-08 21:41:14,032 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks
> for
> region=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
> 2014-02-08 21:41:14,054 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
> Updated row
> tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
> with server=hdtest208.svl.ibm.com,60020,1391887547473
> 2014-02-08 21:41:14,054 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Finished post open deploy
> task for
> tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
> 2014-02-08 21:41:14,054 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks
> for
> region=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.
> 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HStore:
> Completed compaction of 10 file(s) in cf of
> tpch_hb_1000_2.lineitem,^\x01\x8B\xE7(\x80\x01\x80\x93\xFD\x01\x01\x80\x00\x00\x00\xB5\x0E\xCC'\x01\x80\x00\x00\x03,1391918508561.1fbcfc0a792435dfd73ec5b0ef5c953c.
> into 451be6df8c604993ae540b808d9cfa08(size=72.8 M), total size for store is
> 2.4 G. This selection was in queue for 0sec, and took 1mins, 40sec to execute.
> 2014-02-08 21:41:14,059 INFO
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Completed
> compaction: Request =
> regionName=tpch_hb_1000_2.lineitem,^\x01\x8B\xE7(\x80\x01\x80\x93\xFD\x01\x01\x80\x00\x00\x00\xB5\x0E\xCC'\x01\x80\x00\x00\x03,1391918508561.1fbcfc0a792435dfd73ec5b0ef5c953c.,
> storeName=cf, fileCount=10, fileSize=94.1 M, priority=9883,
> time=1391924373278861000; duration=1mins, 40sec
> 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Starting compaction on cf in region
> tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
> 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HStore:
> Starting compaction of 10 file(s) in cf of
> tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
> into
> tmpdir=gpfs:/hbase/data/default/tpch_hb_1000_2.lineitem/c2eb9b7971ca7f3fed3da86df5b788e7/.tmp,
> totalSize=709.7 M
> 2014-02-08 21:41:14,066 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
> Updated row
> tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44. with
> server=hdtest208.svl.ibm.com,60020,1391887547473
> 2014-02-08 21:41:14,066 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Finished post open deploy
> task for
> tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.
> 2014-02-08 21:41:14,190 INFO
> org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, hbase:meta
> updated, and report to master.
> Parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d.,
> new regions:
> tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
> tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7..
> Split took 57mins, 16sec
> {code}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)