[ 
https://issues.apache.org/jira/browse/HBASE-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898262#comment-13898262
 ] 

Jerry He commented on HBASE-10492:
----------------------------------

The machines are 24 CPU 48G memory with 
Red Hat Enterprise Linux Server release 6.4 (Santiago) 2.6.32-358.el6.x86_64
IBM JDK 6
5 region servers (each with datanode and task tracker).  The load MR job with 
loading of data.
I have been trying to reproduce the long delay in opening the daughter regions.
With 'org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 200' I have 
seen delays up to 6 mins.
See the log below (from 2014-02-11 02:35:52 to 2014-02-11 02:41:14 at the end)
{code}
2014-02-11 02:35:52,473 WARN 
org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not 
found for region: 10a421ac8075a42cbcb53bdc393c8e8c
2014-02-11 02:35:52,479 WARN 
org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not 
found for region: 5ff07e59d13c99ca14408807a6e61722
2014-02-11 02:35:52,589 INFO 
org.apache.hadoop.hbase.regionserver.compactions.CompactionConfiguration: size 
[4194304, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 
5.000000; throttle point 2684354560; delete expired; major period 0, major 
jitter 0.500000
2014-02-11 02:35:52,596 INFO 
org.apache.hadoop.hbase.regionserver.compactions.CompactionConfiguration: size 
[4194304, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 
5.000000; throttle point 2684354560; delete expired; major period 0, major 
jitter 0.500000
2014-02-11 02:35:55,458 INFO 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, 
sequenceid=4289924, memsize=256.6 M, hasBloomFilter=true, into tmp file 
gpfs:/hbase/data/default/TestTable/ed4d9fb392ae52c1a406a221defc6b00/.tmp/9e2cb318b0114248b9c62948cf47ac5b
2014-02-11 02:36:37,894 INFO 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, 
sequenceid=4289926, memsize=153.1 M, hasBloomFilter=true, into tmp file 
gpfs:/hbase/data/default/TestTable/110cc21c77569d595f7717b8c75fbf66/.tmp/4e55d6ba4b5644838163101f2ba20fdb
2014-02-11 02:36:53,067 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
Rolled WAL 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392114789609
 with entries=416, filesize=578.7 M; new WAL 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392114958416
2014-02-11 02:36:53,067 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112795409
 whose highest sequenceid is 4285071 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112795409
2014-02-11 02:36:53,162 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112818204
 whose highest sequenceid is 4285169 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112818204
2014-02-11 02:36:53,210 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112839023
 whose highest sequenceid is 4285266 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112839023
2014-02-11 02:37:13,297 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112862511
 whose highest sequenceid is 4285362 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112862511
2014-02-11 02:37:13,326 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112871587
 whose highest sequenceid is 4285453 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112871587
2014-02-11 02:37:13,383 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112877894
 whose highest sequenceid is 4285546 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112877894
2014-02-11 02:37:33,474 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112891408
 whose highest sequenceid is 4285641 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112891408
2014-02-11 02:37:33,481 INFO org.apache.hadoop.hbase.regionserver.HStore: Added 
gpfs:/hbase/data/default/TestTable/ed4d9fb392ae52c1a406a221defc6b00/info/9e2cb318b0114248b9c62948cf47ac5b,
 entries=225739, sequenceid=4289924, filesize=227.8 M
2014-02-11 02:37:33,481 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Finished memstore flush of ~256.6 M/269080888, currentsize=0/0 for region 
TestTable,00000000000000000083857650,1392100516043.ed4d9fb392ae52c1a406a221defc6b00.
 in 104430ms, sequenceid=4289924, compaction requested=true
2014-02-11 02:37:53,550 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112901166
 whose highest sequenceid is 4285739 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112901166
2014-02-11 02:38:13,610 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112957011
 whose highest sequenceid is 4285845 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112957011
2014-02-11 02:38:13,657 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112996569
 whose highest sequenceid is 4285939 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112996569
2014-02-11 02:38:13,709 INFO org.apache.hadoop.hbase.regionserver.HStore: Added 
gpfs:/hbase/data/default/TestTable/110cc21c77569d595f7717b8c75fbf66/info/4e55d6ba4b5644838163101f2ba20fdb,
 entries=134636, sequenceid=4289926, filesize=135.9 M
2014-02-11 02:38:13,709 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Finished memstore flush of ~154.6 M/162140608, currentsize=50.6 M/53094064 for 
region 
TestTable,00000000000000000097375233,1392102079505.110cc21c77569d595f7717b8c75fbf66.
 in 143570ms, sequenceid=4289926, compaction requested=true
2014-02-11 02:38:13,726 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392113011433
 whose highest sequenceid is 4286032 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392113011433
2014-02-11 02:38:13,763 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392113031250
 whose highest sequenceid is 4286132 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392113031250
2014-02-11 02:38:19,012 INFO 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, 
sequenceid=4290528, memsize=258.2 M, hasBloomFilter=true, into tmp file 
gpfs:/hbase/data/default/TestTable/d2182e25d616eeccb29c50356ba7df3e/.tmp/1954e5e2f04f49139a75c77d9c71efbe
2014-02-11 02:38:53,871 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392113067064
 whose highest sequenceid is 4286371 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392113067064
2014-02-11 02:39:13,926 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392113092341
 whose highest sequenceid is 4286698 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392113092341
2014-02-11 02:39:17,850 INFO 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, 
sequenceid=4290603, memsize=260.3 M, hasBloomFilter=true, into tmp file 
gpfs:/hbase/data/default/TestTable/d43376acb9509f28299e7c0373957502/.tmp/f01bfa141fc64f859115144284093e0a
2014-02-11 02:39:34,010 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392113120811
 whose highest sequenceid is 4286993 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392113120811
2014-02-11 02:39:54,068 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392113296100
 whose highest sequenceid is 4287325 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392113296100
2014-02-11 02:40:54,179 INFO org.apache.hadoop.hbase.regionserver.HStore: Added 
gpfs:/hbase/data/default/TestTable/d2182e25d616eeccb29c50356ba7df3e/info/1954e5e2f04f49139a75c77d9c71efbe,
 entries=227127, sequenceid=4290528, filesize=229.2 M
2014-02-11 02:40:54,179 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Finished memstore flush of ~258.2 M/270735384, currentsize=0/0 for region 
TestTable,00000000000000000152484874,1392104467789.d2182e25d616eeccb29c50356ba7df3e.
 in 200657ms, sequenceid=4290528, compaction requested=true
2014-02-11 02:41:14,206 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Onlined 10a421ac8075a42cbcb53bdc393c8e8c; next sequenceid=4289919
{code}



> open daughter regions can unpredictably take long time
> ------------------------------------------------------
>
>                 Key: HBASE-10492
>                 URL: https://issues.apache.org/jira/browse/HBASE-10492
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.96.0
>            Reporter: Jerry He
>
> I have seen during a stress testing client was getting 
> "RetriesExhaustedWithDetailsException: Failed 748 actions: 
> NotServingRegionException"
> On the master log, 2014-02-08 20:43 is the timestamp from OFFLINE to 
> SPLITTING_NEW, 2014-02-08 21:41 is the timestamp from SPLITTING_NEW to OPEN.
> The corresponding time period on the region sever log is:
> {code}
> 2014-02-08 20:44:12,662 WARN 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not 
> found for region: 010c1981882d1a59201af5e2dc589d44
> 2014-02-08 20:44:12,666 WARN 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not 
> found for region: c2eb9b7971ca7f3fed3da86df5b788e7
> {code}
> There were no INFO related to these two regions until: (at the end see this: 
> Split took 57mins, 16sec)
> {code}
> 2014-02-08 21:41:14,029 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Onlined c2eb9b7971ca7f3fed3da86df5b788e7; next sequenceid=213355
> 2014-02-08 21:41:14,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Onlined 010c1981882d1a59201af5e2dc589d44; next sequenceid=213354
> 2014-02-08 21:41:14,032 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks 
> for 
> region=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
> 2014-02-08 21:41:14,054 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Updated row 
> tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
>  with server=hdtest208.svl.ibm.com,60020,1391887547473
> 2014-02-08 21:41:14,054 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Finished post open deploy 
> task for 
> tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
> 2014-02-08 21:41:14,054 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks 
> for 
> region=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.
> 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HStore: 
> Completed compaction of 10 file(s) in cf of 
> tpch_hb_1000_2.lineitem,^\x01\x8B\xE7(\x80\x01\x80\x93\xFD\x01\x01\x80\x00\x00\x00\xB5\x0E\xCC'\x01\x80\x00\x00\x03,1391918508561.1fbcfc0a792435dfd73ec5b0ef5c953c.
>  into 451be6df8c604993ae540b808d9cfa08(size=72.8 M), total size for store is 
> 2.4 G. This selection was in queue for 0sec, and took 1mins, 40sec to execute.
> 2014-02-08 21:41:14,059 INFO 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Completed 
> compaction: Request = 
> regionName=tpch_hb_1000_2.lineitem,^\x01\x8B\xE7(\x80\x01\x80\x93\xFD\x01\x01\x80\x00\x00\x00\xB5\x0E\xCC'\x01\x80\x00\x00\x03,1391918508561.1fbcfc0a792435dfd73ec5b0ef5c953c.,
>  storeName=cf, fileCount=10, fileSize=94.1 M, priority=9883, 
> time=1391924373278861000; duration=1mins, 40sec
> 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Starting compaction on cf in region 
> tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
> 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HStore: 
> Starting compaction of 10 file(s) in cf of 
> tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
>  into 
> tmpdir=gpfs:/hbase/data/default/tpch_hb_1000_2.lineitem/c2eb9b7971ca7f3fed3da86df5b788e7/.tmp,
>  totalSize=709.7 M
> 2014-02-08 21:41:14,066 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Updated row 
> tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44. with 
> server=hdtest208.svl.ibm.com,60020,1391887547473
> 2014-02-08 21:41:14,066 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Finished post open deploy 
> task for 
> tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.
> 2014-02-08 21:41:14,190 INFO 
> org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, hbase:meta 
> updated, and report to master. 
> Parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d.,
>  new regions: 
> tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44., 
> tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7..
>  Split took 57mins, 16sec
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to