Can you take a look at the server log for hdtest208.svl.ibm.com during this period of time ?
Thanks On Sun, Feb 9, 2014 at 3:23 PM, Jerry He <[email protected]> wrote: > Hi, Ted, Jimmy > > There were not much activities showing up in the master log during the time > between. (The BackupLogCleaner is our log cleaner plugin that only checks > to see if the logs can be deleted) > There is no Info at all from > *2014-02-08 20:59:48,659* > *2014-02-08 21:21:47,445* > > 2014-02-08 20:35:55,910 INFO org.apache.hadoop.hbase.master.CatalogJanitor: > Scanned 75 catalog row(s), gc'd 0 unreferenced merged region(s) and 2 > unreferenced parent region(s) > *2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates: > Transitioned {010c1981882d1a59201af5e2dc589d44 state=OFFLINE, > ts=1391921037382, server=null} to {010c1981882d1a59201af5e2dc589d44 > state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com > <http://hdtest208.svl.ibm.com>,60020,1391887547473}* > 2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates: > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=OFFLINE, > ts=1391921037382, server=null} to {c2eb9b7971ca7f3fed3da86df5b788e7 > state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com > ,60020,1391887547473} > 2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates: > Transitioned {b576e8db65d56ec08db5ca900587c28d state=OPEN, > ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473} to > {b576e8db65d56ec08db5ca900587c28d state=SPLITTING, ts=1391921037382, > server= > hdtest208.svl.ibm.com,60020,1391887547473} > 2014-02-08 20:46:47,892 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=hdtest207.svl.ibm.com:2181, > hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000 > watcher=Backup Client > 2014-02-08 20:46:47,893 INFO > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process > identifier=Backup Client connecting to ZooKeeper ensemble= > hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181, > hdtest205.svl.ibm.com:2181 > 2014-02-08 20:46:47,894 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server hdtest205.svl.ibm.com/9.30.194.214:2181. Will > not attempt to authenticate using SASL (Unable to locate a login > configuration) > 2014-02-08 20:46:47,894 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to hdtest205.svl.ibm.com/9.30.194.214:2181, > initiating session > 2014-02-08 20:46:47,910 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server hdtest205.svl.ibm.com/9.30.194.214:2181, > sessionid = 0x24412bcbe5d0684, negotiated timeout = 60000 > 2014-02-08 20:46:47,919 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x24412bcbe5d0684 closed > 2014-02-08 20:46:47,919 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > 2014-02-08 20:50:56,276 INFO org.apache.hadoop.hbase.catalog.MetaEditor: > Deleted > tpch_hb_1000_2.lineitem,,1391914553504.de69c2df235d7307931b08206b2e270c. > 2014-02-08 20:50:57,955 INFO org.apache.hadoop.hbase.master.CatalogJanitor: > Scanned 75 catalog row(s), gc'd 0 unreferenced merged region(s) and 1 > unreferenced parent region(s) > 2014-02-08 20:59:47,626 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=hdtest207.svl.ibm.com:2181, > hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000 > watcher=Backup Client > 2014-02-08 20:59:47,626 INFO > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process > identifier=Backup Client connecting to ZooKeeper ensemble= > hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181, > hdtest205.svl.ibm.com:2181 > 2014-02-08 20:59:47,627 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server hdtest201.svl.ibm.com/9.30.194.210:2181. Will > not attempt to authenticate using SASL (Unable to locate a login > configuration) > 2014-02-08 20:59:47,627 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to hdtest201.svl.ibm.com/9.30.194.210:2181, > initiating session > 2014-02-08 20:59:47,633 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server hdtest201.svl.ibm.com/9.30.194.210:2181, > sessionid = 0x34412bcbe93070c, negotiated timeout = 60000 > 2014-02-08 20:59:47,641 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x34412bcbe93070c closed > 2014-02-08 20:59:47,641 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > 2014-02-08 20:59:47,641 INFO > com.ibm.biginsights.hbase.backup.BackupLogCleaner: ok to delete: > gpfs:/hbase/oldWALs/hdtest208.svl.ibm.com > %2C60020%2C1391887547473.1391919916110 > 2014-02-08 20:59:48,643 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=hdtest207.svl.ibm.com:2181, > hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000 > watcher=Backup Client > 2014-02-08 20:59:48,644 INFO > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process > identifier=Backup Client connecting to ZooKeeper ensemble= > hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181, > hdtest205.svl.ibm.com:2181 > 2014-02-08 20:59:48,645 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server hdtest205.svl.ibm.com/9.30.194.214:2181. Will > not attempt to authenticate using SASL (Unable to locate a login > configuration) > 2014-02-08 20:59:48,645 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to hdtest205.svl.ibm.com/9.30.194.214:2181, > initiating session > 2014-02-08 20:59:48,651 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server hdtest205.svl.ibm.com/9.30.194.214:2181, > sessionid = 0x24412bcbe5d06c9, negotiated timeout = 60000 > 2014-02-08 20:59:48,659 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x24412bcbe5d06c9 closed > 2014-02-08 20:59:48,659 INFO > com.ibm.biginsights.hbase.backup.BackupLogCleaner: ok to delete: > gpfs:/hbase/oldWALs/hdtest208.svl.ibm.com > %2C60020%2C1391887547473.1391920146024 > *2014-02-08 20:59:48,659* INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > *2014-02-08 21:21:47,445* INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=hdtest207.svl.ibm.com:2181, > hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000 > watcher=Backup Client > 2014-02-08 21:21:47,446 INFO > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process > identifier=Backup Client connecting to ZooKeeper ensemble= > hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181, > hdtest205.svl.ibm.com:2181 > 2014-02-08 21:21:47,446 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server hdtest207.svl.ibm.com/9.30.194.216:2181. Will > not attempt to authenticate using SASL (Unable to locate a login > configuration) > 2014-02-08 21:21:47,447 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to hdtest207.svl.ibm.com/9.30.194.216:2181, > initiating session > 2014-02-08 21:21:47,470 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server hdtest207.svl.ibm.com/9.30.194.216:2181, > sessionid = 0x14412bcbe630833, negotiated timeout = 60000 > 2014-02-08 21:21:47,479 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x14412bcbe630833 closed > 2014-02-08 21:21:47,479 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > 2014-02-08 21:39:47,560 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=hdtest207.svl.ibm.com:2181, > hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000 > watcher=Backup Client > 2014-02-08 21:39:47,561 INFO > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process > identifier=Backup Client connecting to ZooKeeper ensemble= > hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181, > hdtest205.svl.ibm.com:2181 > 2014-02-08 21:39:47,561 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server hdtest207.svl.ibm.com/9.30.194.216:2181. Will > not attempt to authenticate using SASL (Unable to locate a login > configuration) > 2014-02-08 21:39:47,562 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to hdtest207.svl.ibm.com/9.30.194.216:2181, > initiating session > 2014-02-08 21:39:47,569 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server hdtest207.svl.ibm.com/9.30.194.216:2181, > sessionid = 0x14412bcbe630880, negotiated timeout = 60000 > 2014-02-08 21:39:47,577 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x14412bcbe630880 closed > 2014-02-08 21:39:47,577 INFO > com.ibm.biginsights.hbase.backup.BackupLogCleaner: ok to delete: > gpfs:/hbase/oldWALs/hdtest203.svl.ibm.com > %2C60020%2C1391887547484.1391917561641 > 2014-02-08 21:39:47,577 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > *2014-02-08 21:41*:14,093 INFO org.apache.hadoop.hbase.master.RegionStates: > Transitioned {b576e8db65d56ec08db5ca900587c28d state=SPLITTING, > ts=1391924474093, server=hdtest208.svl.ibm.com,60020,1391887547473} to > {b576e8db65d56ec08db5ca900587c28d state=SPLIT, ts=1391924474093, server= > hdtest208.svl.ibm.com,60020,1391887547473} > 2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates: > Offlined b576e8db65d56ec08db5ca900587c28d from hdtest208.svl.ibm.com > ,60020,1391887547473 > 2*014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates: > Transitioned {010c1981882d1a59201af5e2dc589d44 state=SPLITTING_NEW, > ts=1391924474093, server=hdtest208.svl.ibm.com > <http://hdtest208.svl.ibm.com>,60020,1391887547473} to > {010c1981882d1a59201af5e2dc589d44 state=OPEN, ts=1391924474093, > server=hdtest208.svl.ibm.com > <http://hdtest208.svl.ibm.com>,60020,1391887547473}* > *2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates: > Onlined 010c1981882d1a59201af5e2dc589d44 on hdtest208.svl.ibm.com > <http://hdtest208.svl.ibm.com>,60020,1391887547473* > *2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates: > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=SPLITTING_NEW, > ts=1391924474093, server=hdtest208.svl.ibm.com > <http://hdtest208.svl.ibm.com>,60020,1391887547473} to > {c2eb9b7971ca7f3fed3da86df5b788e7 state=OPEN, ts=1391924474093, > server=hdtest208.svl.ibm.com > <http://hdtest208.svl.ibm.com>,60020,1391887547473}* > *2014-02-08 21:41:14,094 INFO org.apache.hadoop.hbase.master.RegionStates: > Onlined c2eb9b7971ca7f3fed3da86df5b788e7 on hdtest208.svl.ibm.com > <http://hdtest208.svl.ibm.com>,60020,1391887547473* > *2014-02-08 21:41:14,100 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Handled SPLIT event; > > parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d., > daughter > a=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44., > daughter > > b=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7., > on hdtest208.svl.ibm.com <http://hdtest208.svl.ibm.com > >,60020,1391887547473* > 2014-02-08 21:41:14,283 INFO org.apache.hadoop.hbase.master.RegionStates: > Transitioned {4ae7fe11b10dde440d1bdcaf5346edd6 state=OFFLINE, > ts=1391924474283, server=null} to {4ae7fe11b10dde440d1bdcaf5346edd6 > state=SPLITTING_NEW, ts=1391924474283, server=hdtest208.svl.ibm.com > ,60020,1391887547473} > 2014-02-08 21:41:14,283 INFO org.apache.hadoop.hbase.master.RegionStates: > Transitioned {d80ed7b81921c083dea41044b2ee9ab3 state=OFFLINE, > ts=1391924474283, server=null} to {d80ed7b81921c083dea41044b2ee9ab3 > state=SPLITTING_NEW, ts=1391924474283, server=hdtest208.svl.ibm.com > ,60020,1391887547473} > 2014-02-08 21:41:14,283 INFO org.apache.hadoop.hbase.master.RegionStates: > Transitioned {1fbcfc0a792435dfd73ec5b0ef5c953c state=OPEN, > ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473} to > {1fbcfc0a792435dfd73ec5b0ef5c953c state=SPLITTING, ts=1391924474283, > server= > hdtest208.svl.ibm.com,60020,1391887547473} > > > > On Sun, Feb 9, 2014 at 2:33 PM, Jimmy Xiang <[email protected]> wrote: > > > Was the master very busy? How many regions were in transition during that > > period? > > > > > > On Sun, Feb 9, 2014 at 2:31 PM, Ted Yu <[email protected]> wrote: > > > > > Can you pastebin master log from 20:40:00 to 21:42:00 so that we get > more > > > context on what happened to tpch_hb_1000_2.lineitem,,1391921037353. > > > 010c1981882d1a59201af5e2dc589d44. ? > > > > > > Cheers > > > > > > > > > On Sun, Feb 9, 2014 at 2:12 PM, Jerry He <[email protected]> wrote: > > > > > > > Hi, folks > > > > > > > > This is what I am seeing when running in a stress env. I am getting > > > > "RetriesExhaustedWithDetailsException: Failed 748 actions: > > > > NotServingRegionException" > > > > On the master log, *2014-02-08 20:43 is the timestamp from OFFLINE > to * > > > > *SPLITTING_NEW*, *2014-02-08 21:41 is the timestamp from > > **SPLITTING_NEW > > > to > > > > OPEN.* > > > > Am I seeing anything wrong here? > > > > > > > > 2014-02-08 20:45:53,215 WARN mapreduce.Counters: Group > > FileSystemCounters > > > > is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter > > instead > > > > *2014-02-08 20:50:44,824* WARN > > > org.apache.hadoop.hbase.client.AsyncProcess: > > > > Attempt #35/35 failed for 748 ops on hdtest208.svl.ibm.com > > > > ,60020,1391887547473 > > > > NOT resubmitting. > > > > > > > > > > > > > > region=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44., > > > > hostname=hdtest208.svl.ibm.com,60020,1391887547473, seqNum=1 > > > > 2014-02-08 20:50:44,839 INFO > > > org.apache.sqoop.mapreduce.AutoProgressMapper: > > > > Auto-progress thread is finished. keepGoing=false > > > > 2014-02-08 20:50:44,842 INFO > > org.apache.hadoop.mapred.TaskLogsTruncater: > > > > Initializing logs' truncater with mapRetainSize=-1 and > > > reduceRetainSize=-1 > > > > 2014-02-08 20:50:44,858 ERROR > > > > org.apache.hadoop.security.UserGroupInformation: > > > PriviledgedActionException > > > > as:hive (auth:SIMPLE) > > > > > > > > > > cause:org.apache.hadoop.hbase.client.*RetriesExhaustedWithDetailsException: > > > > Failed 748 actions: NotServingRegionException*: 748 times, > > > > 2014-02-08 20:50:44,858 WARN org.apache.hadoop.mapred.Child: Error > > > running > > > > child > > > > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: > > > Failed > > > > 748 actions: NotServingRegionException: 748 times, > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:185) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:169) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:782) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:934) > > > > > > > > > > > > *2014-02-08 20:43:57,382 INFO > > > org.apache.hadoop.hbase.master.RegionStates: > > > > Transitioned {010c1981882d1a59201af5e2dc589d44 state=OFFLINE, > > > > ts=1391921037382, server=null} to {010c1981882d1a59201af5e2dc589d44 > > > > state=SPLITTING_NEW*, ts=1391921037382, server=hdtest208.svl.ibm.com > > > > ,60020,1391887547473} > > > > 2014-02-08 20:43:57,382 INFO > > org.apache.hadoop.hbase.master.RegionStates: > > > > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=OFFLINE, > > > > ts=1391921037382, server=null} to {c2eb9b7971ca7f3fed3da86df5b788e7 > > > > state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com > > > > ,60020,1391887547473} > > > > 2014-02-08 20:43:57,382 INFO > > org.apache.hadoop.hbase.master.RegionStates: > > > > Transitioned {b576e8db65d56ec08db5ca900587c28d state=OPEN, > > > > ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473} > to > > > > {b576e8db65d56ec08db5ca900587c28d state=SPLITTING, ts=1391921037382, > > > > server= > > > > hdtest208.svl.ibm.com,60020,1391887547473} > > > > > > > > > > > > *2014-02-08 21:41:14,093 INFO > > > org.apache.hadoop.hbase.master.RegionStates: > > > > Transitioned {010c1981882d1a59201af5e2dc589d44 > > > > state=SPLITTING_NEW,*ts=1391924474093, server= > > > > hdtest208.svl.ibm.com,60020,1391887547473} to > > > > {*010c1981882d1a59201af5e2dc589d44 > > > > state=OPEN*, ts=1391924474093, server=hdtest208.svl.ibm.com > > > > ,60020,1391887547473} > > > > 2014-02-08 21:41:14,093 INFO > > org.apache.hadoop.hbase.master.RegionStates: > > > > Onlined 010c1981882d1a59201af5e2dc589d44 on hdtest208.svl.ibm.com > > > > ,60020,1391887547473 > > > > 2014-02-08 21:41:14,093 INFO > > org.apache.hadoop.hbase.master.RegionStates: > > > > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=SPLITTING_NEW, > > > > ts=1391924474093, server=hdtest208.svl.ibm.com,60020,1391887547473} > to > > > > {c2eb9b7971ca7f3fed3da86df5b788e7 state=OPEN, ts=1391924474093, > server= > > > > hdtest208.svl.ibm.com,60020,1391887547473} > > > > 2014-02-08 21:41:14,094 INFO > > org.apache.hadoop.hbase.master.RegionStates: > > > > Onlined c2eb9b7971ca7f3fed3da86df5b788e7 on hdtest208.svl.ibm.com > > > > ,60020,1391887547473 > > > > 2014-02-08 21:41:14,100 INFO > > > > org.apache.hadoop.hbase.master.AssignmentManager: Handled SPLIT > event; > > > > > > > > > > > > > > parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d., > > > > daughter > > > > > > > > > > a=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44., > > > > daughter > > > > > > > > > > > > > > b=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7., > > > > on hdtest208.svl.ibm.com,60020,1391887547473 > > > > > > > > > >
