Hi, Ted, Jimmy
There were not much activities showing up in the master log during the time
between. (The BackupLogCleaner is our log cleaner plugin that only checks
to see if the logs can be deleted)
There is no Info at all from
*2014-02-08 20:59:48,659*
*2014-02-08 21:21:47,445*
2014-02-08 20:35:55,910 INFO org.apache.hadoop.hbase.master.CatalogJanitor:
Scanned 75 catalog row(s), gc'd 0 unreferenced merged region(s) and 2
unreferenced parent region(s)
*2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {010c1981882d1a59201af5e2dc589d44 state=OFFLINE,
ts=1391921037382, server=null} to {010c1981882d1a59201af5e2dc589d44
state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com
<http://hdtest208.svl.ibm.com>,60020,1391887547473}*
2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=OFFLINE,
ts=1391921037382, server=null} to {c2eb9b7971ca7f3fed3da86df5b788e7
state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com
,60020,1391887547473}
2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {b576e8db65d56ec08db5ca900587c28d state=OPEN,
ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473} to
{b576e8db65d56ec08db5ca900587c28d state=SPLITTING, ts=1391921037382, server=
hdtest208.svl.ibm.com,60020,1391887547473}
2014-02-08 20:46:47,892 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=hdtest207.svl.ibm.com:2181,
hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000
watcher=Backup Client
2014-02-08 20:46:47,893 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
identifier=Backup Client connecting to ZooKeeper ensemble=
hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
hdtest205.svl.ibm.com:2181
2014-02-08 20:46:47,894 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server hdtest205.svl.ibm.com/9.30.194.214:2181. Will
not attempt to authenticate using SASL (Unable to locate a login
configuration)
2014-02-08 20:46:47,894 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hdtest205.svl.ibm.com/9.30.194.214:2181,
initiating session
2014-02-08 20:46:47,910 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server hdtest205.svl.ibm.com/9.30.194.214:2181,
sessionid = 0x24412bcbe5d0684, negotiated timeout = 60000
2014-02-08 20:46:47,919 INFO org.apache.zookeeper.ZooKeeper: Session:
0x24412bcbe5d0684 closed
2014-02-08 20:46:47,919 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2014-02-08 20:50:56,276 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Deleted
tpch_hb_1000_2.lineitem,,1391914553504.de69c2df235d7307931b08206b2e270c.
2014-02-08 20:50:57,955 INFO org.apache.hadoop.hbase.master.CatalogJanitor:
Scanned 75 catalog row(s), gc'd 0 unreferenced merged region(s) and 1
unreferenced parent region(s)
2014-02-08 20:59:47,626 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=hdtest207.svl.ibm.com:2181,
hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000
watcher=Backup Client
2014-02-08 20:59:47,626 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
identifier=Backup Client connecting to ZooKeeper ensemble=
hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
hdtest205.svl.ibm.com:2181
2014-02-08 20:59:47,627 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server hdtest201.svl.ibm.com/9.30.194.210:2181. Will
not attempt to authenticate using SASL (Unable to locate a login
configuration)
2014-02-08 20:59:47,627 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hdtest201.svl.ibm.com/9.30.194.210:2181,
initiating session
2014-02-08 20:59:47,633 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server hdtest201.svl.ibm.com/9.30.194.210:2181,
sessionid = 0x34412bcbe93070c, negotiated timeout = 60000
2014-02-08 20:59:47,641 INFO org.apache.zookeeper.ZooKeeper: Session:
0x34412bcbe93070c closed
2014-02-08 20:59:47,641 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2014-02-08 20:59:47,641 INFO
com.ibm.biginsights.hbase.backup.BackupLogCleaner: ok to delete:
gpfs:/hbase/oldWALs/hdtest208.svl.ibm.com
%2C60020%2C1391887547473.1391919916110
2014-02-08 20:59:48,643 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=hdtest207.svl.ibm.com:2181,
hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000
watcher=Backup Client
2014-02-08 20:59:48,644 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
identifier=Backup Client connecting to ZooKeeper ensemble=
hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
hdtest205.svl.ibm.com:2181
2014-02-08 20:59:48,645 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server hdtest205.svl.ibm.com/9.30.194.214:2181. Will
not attempt to authenticate using SASL (Unable to locate a login
configuration)
2014-02-08 20:59:48,645 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hdtest205.svl.ibm.com/9.30.194.214:2181,
initiating session
2014-02-08 20:59:48,651 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server hdtest205.svl.ibm.com/9.30.194.214:2181,
sessionid = 0x24412bcbe5d06c9, negotiated timeout = 60000
2014-02-08 20:59:48,659 INFO org.apache.zookeeper.ZooKeeper: Session:
0x24412bcbe5d06c9 closed
2014-02-08 20:59:48,659 INFO
com.ibm.biginsights.hbase.backup.BackupLogCleaner: ok to delete:
gpfs:/hbase/oldWALs/hdtest208.svl.ibm.com
%2C60020%2C1391887547473.1391920146024
*2014-02-08 20:59:48,659* INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
*2014-02-08 21:21:47,445* INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=hdtest207.svl.ibm.com:2181,
hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000
watcher=Backup Client
2014-02-08 21:21:47,446 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
identifier=Backup Client connecting to ZooKeeper ensemble=
hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
hdtest205.svl.ibm.com:2181
2014-02-08 21:21:47,446 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server hdtest207.svl.ibm.com/9.30.194.216:2181. Will
not attempt to authenticate using SASL (Unable to locate a login
configuration)
2014-02-08 21:21:47,447 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hdtest207.svl.ibm.com/9.30.194.216:2181,
initiating session
2014-02-08 21:21:47,470 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server hdtest207.svl.ibm.com/9.30.194.216:2181,
sessionid = 0x14412bcbe630833, negotiated timeout = 60000
2014-02-08 21:21:47,479 INFO org.apache.zookeeper.ZooKeeper: Session:
0x14412bcbe630833 closed
2014-02-08 21:21:47,479 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2014-02-08 21:39:47,560 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=hdtest207.svl.ibm.com:2181,
hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000
watcher=Backup Client
2014-02-08 21:39:47,561 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
identifier=Backup Client connecting to ZooKeeper ensemble=
hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
hdtest205.svl.ibm.com:2181
2014-02-08 21:39:47,561 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server hdtest207.svl.ibm.com/9.30.194.216:2181. Will
not attempt to authenticate using SASL (Unable to locate a login
configuration)
2014-02-08 21:39:47,562 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hdtest207.svl.ibm.com/9.30.194.216:2181,
initiating session
2014-02-08 21:39:47,569 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server hdtest207.svl.ibm.com/9.30.194.216:2181,
sessionid = 0x14412bcbe630880, negotiated timeout = 60000
2014-02-08 21:39:47,577 INFO org.apache.zookeeper.ZooKeeper: Session:
0x14412bcbe630880 closed
2014-02-08 21:39:47,577 INFO
com.ibm.biginsights.hbase.backup.BackupLogCleaner: ok to delete:
gpfs:/hbase/oldWALs/hdtest203.svl.ibm.com
%2C60020%2C1391887547484.1391917561641
2014-02-08 21:39:47,577 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
*2014-02-08 21:41*:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {b576e8db65d56ec08db5ca900587c28d state=SPLITTING,
ts=1391924474093, server=hdtest208.svl.ibm.com,60020,1391887547473} to
{b576e8db65d56ec08db5ca900587c28d state=SPLIT, ts=1391924474093, server=
hdtest208.svl.ibm.com,60020,1391887547473}
2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
Offlined b576e8db65d56ec08db5ca900587c28d from hdtest208.svl.ibm.com
,60020,1391887547473
2*014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {010c1981882d1a59201af5e2dc589d44 state=SPLITTING_NEW,
ts=1391924474093, server=hdtest208.svl.ibm.com
<http://hdtest208.svl.ibm.com>,60020,1391887547473} to
{010c1981882d1a59201af5e2dc589d44 state=OPEN, ts=1391924474093,
server=hdtest208.svl.ibm.com
<http://hdtest208.svl.ibm.com>,60020,1391887547473}*
*2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
Onlined 010c1981882d1a59201af5e2dc589d44 on hdtest208.svl.ibm.com
<http://hdtest208.svl.ibm.com>,60020,1391887547473*
*2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=SPLITTING_NEW,
ts=1391924474093, server=hdtest208.svl.ibm.com
<http://hdtest208.svl.ibm.com>,60020,1391887547473} to
{c2eb9b7971ca7f3fed3da86df5b788e7 state=OPEN, ts=1391924474093,
server=hdtest208.svl.ibm.com
<http://hdtest208.svl.ibm.com>,60020,1391887547473}*
*2014-02-08 21:41:14,094 INFO org.apache.hadoop.hbase.master.RegionStates:
Onlined c2eb9b7971ca7f3fed3da86df5b788e7 on hdtest208.svl.ibm.com
<http://hdtest208.svl.ibm.com>,60020,1391887547473*
*2014-02-08 21:41:14,100 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Handled SPLIT event;
parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d.,
daughter
a=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
daughter
b=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.,
on hdtest208.svl.ibm.com <http://hdtest208.svl.ibm.com>,60020,1391887547473*
2014-02-08 21:41:14,283 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {4ae7fe11b10dde440d1bdcaf5346edd6 state=OFFLINE,
ts=1391924474283, server=null} to {4ae7fe11b10dde440d1bdcaf5346edd6
state=SPLITTING_NEW, ts=1391924474283, server=hdtest208.svl.ibm.com
,60020,1391887547473}
2014-02-08 21:41:14,283 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {d80ed7b81921c083dea41044b2ee9ab3 state=OFFLINE,
ts=1391924474283, server=null} to {d80ed7b81921c083dea41044b2ee9ab3
state=SPLITTING_NEW, ts=1391924474283, server=hdtest208.svl.ibm.com
,60020,1391887547473}
2014-02-08 21:41:14,283 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {1fbcfc0a792435dfd73ec5b0ef5c953c state=OPEN,
ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473} to
{1fbcfc0a792435dfd73ec5b0ef5c953c state=SPLITTING, ts=1391924474283, server=
hdtest208.svl.ibm.com,60020,1391887547473}
On Sun, Feb 9, 2014 at 2:33 PM, Jimmy Xiang <[email protected]> wrote:
> Was the master very busy? How many regions were in transition during that
> period?
>
>
> On Sun, Feb 9, 2014 at 2:31 PM, Ted Yu <[email protected]> wrote:
>
> > Can you pastebin master log from 20:40:00 to 21:42:00 so that we get more
> > context on what happened to tpch_hb_1000_2.lineitem,,1391921037353.
> > 010c1981882d1a59201af5e2dc589d44. ?
> >
> > Cheers
> >
> >
> > On Sun, Feb 9, 2014 at 2:12 PM, Jerry He <[email protected]> wrote:
> >
> > > Hi, folks
> > >
> > > This is what I am seeing when running in a stress env. I am getting
> > > "RetriesExhaustedWithDetailsException: Failed 748 actions:
> > > NotServingRegionException"
> > > On the master log, *2014-02-08 20:43 is the timestamp from OFFLINE to *
> > > *SPLITTING_NEW*, *2014-02-08 21:41 is the timestamp from
> **SPLITTING_NEW
> > to
> > > OPEN.*
> > > Am I seeing anything wrong here?
> > >
> > > 2014-02-08 20:45:53,215 WARN mapreduce.Counters: Group
> FileSystemCounters
> > > is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter
> instead
> > > *2014-02-08 20:50:44,824* WARN
> > org.apache.hadoop.hbase.client.AsyncProcess:
> > > Attempt #35/35 failed for 748 ops on hdtest208.svl.ibm.com
> > > ,60020,1391887547473
> > > NOT resubmitting.
> > >
> > >
> >
> region=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
> > > hostname=hdtest208.svl.ibm.com,60020,1391887547473, seqNum=1
> > > 2014-02-08 20:50:44,839 INFO
> > org.apache.sqoop.mapreduce.AutoProgressMapper:
> > > Auto-progress thread is finished. keepGoing=false
> > > 2014-02-08 20:50:44,842 INFO
> org.apache.hadoop.mapred.TaskLogsTruncater:
> > > Initializing logs' truncater with mapRetainSize=-1 and
> > reduceRetainSize=-1
> > > 2014-02-08 20:50:44,858 ERROR
> > > org.apache.hadoop.security.UserGroupInformation:
> > PriviledgedActionException
> > > as:hive (auth:SIMPLE)
> > >
> >
> cause:org.apache.hadoop.hbase.client.*RetriesExhaustedWithDetailsException:
> > > Failed 748 actions: NotServingRegionException*: 748 times,
> > > 2014-02-08 20:50:44,858 WARN org.apache.hadoop.mapred.Child: Error
> > running
> > > child
> > > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> > Failed
> > > 748 actions: NotServingRegionException: 748 times,
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:185)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:169)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:782)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:934)
> > >
> > >
> > > *2014-02-08 20:43:57,382 INFO
> > org.apache.hadoop.hbase.master.RegionStates:
> > > Transitioned {010c1981882d1a59201af5e2dc589d44 state=OFFLINE,
> > > ts=1391921037382, server=null} to {010c1981882d1a59201af5e2dc589d44
> > > state=SPLITTING_NEW*, ts=1391921037382, server=hdtest208.svl.ibm.com
> > > ,60020,1391887547473}
> > > 2014-02-08 20:43:57,382 INFO
> org.apache.hadoop.hbase.master.RegionStates:
> > > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=OFFLINE,
> > > ts=1391921037382, server=null} to {c2eb9b7971ca7f3fed3da86df5b788e7
> > > state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com
> > > ,60020,1391887547473}
> > > 2014-02-08 20:43:57,382 INFO
> org.apache.hadoop.hbase.master.RegionStates:
> > > Transitioned {b576e8db65d56ec08db5ca900587c28d state=OPEN,
> > > ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473} to
> > > {b576e8db65d56ec08db5ca900587c28d state=SPLITTING, ts=1391921037382,
> > > server=
> > > hdtest208.svl.ibm.com,60020,1391887547473}
> > >
> > >
> > > *2014-02-08 21:41:14,093 INFO
> > org.apache.hadoop.hbase.master.RegionStates:
> > > Transitioned {010c1981882d1a59201af5e2dc589d44
> > > state=SPLITTING_NEW,*ts=1391924474093, server=
> > > hdtest208.svl.ibm.com,60020,1391887547473} to
> > > {*010c1981882d1a59201af5e2dc589d44
> > > state=OPEN*, ts=1391924474093, server=hdtest208.svl.ibm.com
> > > ,60020,1391887547473}
> > > 2014-02-08 21:41:14,093 INFO
> org.apache.hadoop.hbase.master.RegionStates:
> > > Onlined 010c1981882d1a59201af5e2dc589d44 on hdtest208.svl.ibm.com
> > > ,60020,1391887547473
> > > 2014-02-08 21:41:14,093 INFO
> org.apache.hadoop.hbase.master.RegionStates:
> > > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=SPLITTING_NEW,
> > > ts=1391924474093, server=hdtest208.svl.ibm.com,60020,1391887547473} to
> > > {c2eb9b7971ca7f3fed3da86df5b788e7 state=OPEN, ts=1391924474093, server=
> > > hdtest208.svl.ibm.com,60020,1391887547473}
> > > 2014-02-08 21:41:14,094 INFO
> org.apache.hadoop.hbase.master.RegionStates:
> > > Onlined c2eb9b7971ca7f3fed3da86df5b788e7 on hdtest208.svl.ibm.com
> > > ,60020,1391887547473
> > > 2014-02-08 21:41:14,100 INFO
> > > org.apache.hadoop.hbase.master.AssignmentManager: Handled SPLIT event;
> > >
> > >
> >
> parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d.,
> > > daughter
> > >
> >
> a=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
> > > daughter
> > >
> > >
> >
> b=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.,
> > > on hdtest208.svl.ibm.com,60020,1391887547473
> > >
> >
>