Thanks, Andy. I'll give it a shot.
On Wed, Sep 16, 2009 at 3:18 PM, Andrew Purtell <[email protected]> wrote: > Hi Lucas, > > The log fragment you sent is actually after the problem occurred, and it > does not look like 1777 but there's really insufficient information to say > for sure. > > I find that the hypervisor steals 20-50% of CPU time on m1.large or > m1.xlarge instance types as reported by top. I don't recommend any instance > type other than c1.xlarge (0% stolen as reported by top). Also I would > recommend a cluster of 3 c1.xlarge instances to start. Also please have a > look at the troubleshooting wiki and be sure to follow the advice for items > #5, #6, and #7. > > Best regards, > > - Andy > > > > > ________________________________ > From: Lucas Nazário dos Santos <[email protected]> > To: [email protected] > Sent: Wednesday, September 16, 2009 10:48:53 AM > Subject: Re: HBase cracks on EC2 > > Thanks, Andy. > > It's only a test with one large instance (7.5Gb of memory and 4 EC2 compute > units). I've already tried a cluster with 3 small instances but the same > error occurred (I know small instances are not recommended and that is why > I'm trying large instances now). Can it be the number of instances I'm > using? > > Hadoop logs are clear. > > The line bellow caught my attention. I'm wondering if the problem can be > related to the issue http://issues.apache.org/jira/browse/HBASE-1777. > > 2009-09-16 09:02:59,846 FATAL > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: > Replay of hlog required. Forcing server shutdown > java.io.IOException: Server not running, aborting > at org.apache.hadoop.hbase. > regionserver.HRegionServer.checkOpen(HRegionServer.java:2299) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.exists(HRegionServer.java:1775) > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > 2009-09-16 09:02:59,863 INFO > org.apache.hadoop.hbase.regionserver.CompactSplitThread: > regionserver/10.209.207.80:60020.compactor exiting > > Lucas > > > > > On Wed, Sep 16, 2009 at 1:37 PM, Andrew Purtell <[email protected]> > wrote: > > > Lucas, > > > > > [...] ServerManager: 0 region servers, 1 dead, average load > > > > Is this a pseudo distributed test? I.e. everything running on one EC2 > > instance? > > > > What instance type are you using? > > > > The reported exceptions indicate a possible problem with DFS. Are there > > exceptions in the DataNode logs? Or the NameNode log? > > > > Best regards, > > > > - Andy > > > > > > > > > > ________________________________ > > From: Lucas Nazário dos Santos <[email protected]> > > To: [email protected] > > Sent: Wednesday, September 16, 2009 6:21:27 AM > > Subject: HBase cracks on EC2 > > > > Hi, > > > > I'm trying to migrate my solution to Amazon EC2 but the following errors > > crack HBase. I'm using HBase and Hadoop 0.20.0. Does anybody know what > may > > be the cause of it? > > > > Lucas > > > > > > > > HBASE MASTER > > 2009-09-16 09:02:41,271 INFO > org.apache.hadoop.hbase.master.ServerManager: > > 1 > > region servers, 0 dead, average load 6.0 > > 2009-09-16 09:02:41,453 INFO org.apache.hadoop.hbase.master.BaseScanner: > > RegionManager.rootScanner scanning meta region {server: > > 10.209.207.80:60020, > > regionname: -ROOT-,,0, startKey: <>} > > 2009-09-16 09:02:41,472 INFO org.apache.hadoop.hbase.master.BaseScanner: > > RegionManager.rootScanner scan of 1 row(s) of meta region {server: > > 10.209.207.80:60020, regionname: -ROOT-,,0, startKey: <>} complete > > 2009-09-16 09:02:41,553 INFO org.apache.hadoop.hbase.master.BaseScanner: > > RegionManager.metaScanner scanning meta region {server: > > 10.209.207.80:60020, > > regionname: .META.,,1, startKey: <>} > > 2009-09-16 09:02:41,604 INFO org.apache.hadoop.hbase.master.BaseScanner: > > RegionManager.metaScanner scan of 4 row(s) of meta region {server: > > 10.209.207.80:60020, regionname: .META.,,1, startKey: <>} complete > > 2009-09-16 09:02:41,605 INFO org.apache.hadoop.hbase.master.BaseScanner: > > All > > 1 .META. region(s) scanned > > 2009-09-16 09:03:01,905 INFO > org.apache.hadoop.hbase.master.ServerManager: > > domU-12-31-39-07-CC-A2.compute-1.internal,60020,1253103101743 znode > expired > > 2009-09-16 09:03:01,921 INFO > org.apache.hadoop.hbase.master.RegionManager: > > -ROOT- region unset (but not set to be reassigned) > > 2009-09-16 09:03:01,921 INFO > org.apache.hadoop.hbase.master.RegionManager: > > META region removed from onlineMetaRegions > > 2009-09-16 09:03:01,923 INFO > > org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of > > server domU-12-31-39-07-CC-A2.compute-1.internal,60020,1253103101743: > > logSplit: false, rootRescanned: false, numberOfMetaRegions: 1, > > onlineMetaRegions.size(): 0 > > 2009-09-16 09:03:01,935 INFO org.apache.hadoop.hbase.regionserver.HLog: > > Splitting 2 hlog(s) in > > > > > hdfs://domU-12-31-39-07-CC-A2.compute-1.internal:9000/hbase/.logs/domU-12-31-39-07-CC-A2.compute-1.internal,60020,1253103101743 > > 2009-09-16 09:03:01,943 WARN org.apache.hadoop.hbase.regionserver.HLog: > > Exception processing > > > > > hdfs://domU-12-31-39-07-CC-A2.compute-1.internal:9000/hbase/.logs/domU-12-31-39-07-CC-A2.compute-1.internal,60020,1253103101743/hlog.dat.1253103102168 > > -- continuing. Possible DATA LOSS! > > java.io.IOException: wrong key class: > > org.apache.hadoop.hbase.regionserver.HLogKey is not class > > org.apache.hadoop.hbase.regionserver.transactional.THLogKey > > at > > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1824) > > at > > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1876) > > at > org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:880) > > at > org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:802) > > at > > > > > org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:274) > > at > > org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:492) > > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:426) > > 2009-09-16 09:03:01,951 WARN org.apache.hadoop.hbase.regionserver.HLog: > > Exception processing > > > > > hdfs://domU-12-31-39-07-CC-A2.compute-1.internal:9000/hbase/.logs/domU-12-31-39-07-CC-A2.compute-1.internal,60020,1253103101743/hlog.dat.1253105692784 > > -- continuing. Possible DATA LOSS! > > java.io.IOException: wrong key class: > > org.apache.hadoop.hbase.regionserver.HLogKey is not class > > org.apache.hadoop.hbase.regionserver.transactional.THLogKey > > at > > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1824) > > at > > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1876) > > at > org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:880) > > at > org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:802) > > at > > > > > org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:274) > > at > > org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:492) > > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:426) > > 2009-09-16 09:03:01,956 INFO org.apache.hadoop.hbase.regionserver.HLog: > > hlog > > file splitting completed in 29 millis for > > > > > hdfs://domU-12-31-39-07-CC-A2.compute-1.internal:9000/hbase/.logs/domU-12-31-39-07-CC-A2.compute-1.internal,60020,1253103101743 > > 2009-09-16 09:03:01,956 INFO > > org.apache.hadoop.hbase.master.RegionServerOperation: Log split complete, > > meta reassignment and scanning: > > 2009-09-16 09:03:01,956 INFO > > org.apache.hadoop.hbase.master.RegionServerOperation: > ProcessServerShutdown > > reassigning ROOT region > > 2009-09-16 09:03:01,956 INFO > org.apache.hadoop.hbase.master.RegionManager: > > -ROOT- region unset (but not set to be reassigned) > > 2009-09-16 09:03:01,956 INFO > org.apache.hadoop.hbase.master.RegionManager: > > ROOT inserted into regionsInTransition > > 2009-09-16 09:03:01,956 INFO > > org.apache.hadoop.hbase.master.RegionServerOperation: > ProcessServerShutdown > > setting to unassigned: {server: 10.209.207.80:60020, regionname: > > .META.,,1, > > startKey: <>} > > 2009-09-16 09:03:41,273 INFO > org.apache.hadoop.hbase.master.ServerManager: > > 0 > > region servers, 1 dead, average load > > NaN[domU-12-31-39-07-CC-A2.compute-1.internal,60020,1253103101743] > > > > > > > > HBASE REGIONSERVER > > 2009-09-16 09:02:56,571 WARN org.apache.hadoop.hdfs.DFSClient: > > NotReplicatedYetException sleeping > > /hbase/news/698339833/charset/3497317581208054515 retries left 1 > > 2009-09-16 09:02:59,844 WARN org.apache.hadoop.hdfs.DFSClient: > DataStreamer > > Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: > File > > /hbase/news/698339833/charset/3497317581208054515 could only be > replicated > > to 0 nodes, instead of 1 > > at > > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1256) > > at > > > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) > > at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) > > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > > at java.lang.reflect.Method.invoke(Unknown Source) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Unknown Source) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) > > > > at org.apache.hadoop.ipc.Client.call(Client.java:739) > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) > > at $Proxy1.addBlock(Unknown Source) > > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) > > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > > at java.lang.reflect.Method.invoke(Unknown Source) > > at > > > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > > at > > > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > > at $Proxy1.addBlock(Unknown Source) > > at > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2875) > > at > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2757) > > at > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2048) > > at > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2234) > > > > 2009-09-16 09:02:59,844 WARN org.apache.hadoop.hdfs.DFSClient: Error > > Recovery for block null bad datanode[0] nodes == null > > 2009-09-16 09:02:59,844 WARN org.apache.hadoop.hdfs.DFSClient: Could not > > get > > block locations. Source file > > "/hbase/news/698339833/charset/3497317581208054515" - Aborting... > > 17581208054515" - Aborting... > > 2009-09-16 09:02:59,846 FATAL > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Replay of hlog > > required. Forcing server shutdown > > org.apache.hadoop.hbase.DroppedSnapshotException: region: > > news,,1253103842893 > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:950) > > at > > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:843) > > at > > > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:241) > > at > > > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:149) > > Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: > File > > /hbase/news/698339833/charset/3497317581208054515 could only be > replicated > > to 0 nodes, instead of 1 > > at > > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1256) > > at > > > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) > > at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) > > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > > at java.lang.reflect.Method.invoke(Unknown Source) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Unknown Source) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) > > > > at org.apache.hadoop.ipc.Client.call(Client.java:739) > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) > > at $Proxy1.addBlock(Unknown Source) > > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) > > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > > at java.lang.reflect.Method.invoke(Unknown Source) > > at > > > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > > at > > > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > > at $Proxy1.addBlock(Unknown Source) > > at > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2875) > > at > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2757) > > at > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2048) > > at > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2234) > > 2009-09-16 09:02:59,849 INFO > > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: > > request=0.0, regions=6, stores=27, storefiles=36, storefileIndexSize=0, > > memstoreSize=0, usedHeap=195, maxHeap=996, blockCacheSize=82678352, > > blockCacheFree=126368384, blockCacheCount=1168, blockCacheHitRatio=99 > > 2009-09-16 09:02:59,849 INFO > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: > > regionserver/10.209.207.80:60020.cacheFlusher exiting > > 2009-09-16 09:02:59,849 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 1086 on 60020, call exists([...@466057, row= > > http://www.bloomberg.com/news/commentary/marinis.html#, maxVersions=1, > > timeRange=[0,9223372036854775807), families=ALL) from > 10.209.207.80:47866: > > error: java.io.IOException: Server not running, aborting > > java.io.IOException: Server not running, aborting > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2299) > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.exists(HRegionServer.java:1775) > > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > > at java.lang.reflect.Method.invoke(Unknown Source) > > at > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) > > at > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > > 2009-09-16 09:02:59,863 INFO > > org.apache.hadoop.hbase.regionserver.CompactSplitThread: > > regionserver/10.209.207.80:60020.compactor exiting > > 2009-09-16 09:03:00,027 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > > server on 60020 > > 2009-09-16 09:03:00,028 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 1 on 60020: exiting > > 2009-09-16 09:03:00,039 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 1092 on 60020: exiting > > 2009-09-16 09:03:00,042 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > > IPC > > Server listener on 60020 > > 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 3 on 60020: exiting > > 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 4 on 60020: exiting > > 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 7 on 60020: exiting > > 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 8 on 60020: exiting > > 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 9 on 60020: exiting > > 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 10 on 60020: exiting > > 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 18 on 60020: exiting > > 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 19 on 60020: exiting > > 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 20 on 60020: exiting > > 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 22 on 60020: exiting > > 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 23 on 60020: exiting > > 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 24 on 60020: exiting > > 2009-09-16 09:03:00,044 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 25 on 60020: exiting > > 2009-09-16 09:03:00,044 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 26 on 60020: exiting > > 2009-09-16 09:03:00,044 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 27 on 60020: exiting > > 2009-09-16 09:03:00,044 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 29 on 60020: exiting > > 2009-09-16 09:03:00,044 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 30 on 60020: exiting > > 2009-09-16 09:03:01,898 INFO > > org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting > > 2009-09-16 09:03:01,898 INFO org.apache.zookeeper.ZooKeeper: Closing > > session: 0x123c2c3d8bb0001 > > 2009-09-16 09:03:01,899 INFO org.apache.zookeeper.ClientCnxn: Closing > > ClientCnxn for session: 0x123c2c3d8bb0001 > > 2009-09-16 09:03:01,905 INFO org.apache.zookeeper.ClientCnxn: Exception > > while closing send thread for session 0x123c2c3d8bb0001 : Read error rc = > > -1 > > java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] > > 2009-09-16 09:03:02,008 INFO org.apache.zookeeper.ClientCnxn: > Disconnecting > > ClientCnxn for session: 0x123c2c3d8bb0001 > > 2009-09-16 09:03:02,008 INFO org.apache.zookeeper.ZooKeeper: Session: > > 0x123c2c3d8bb0001 closed > > 2009-09-16 09:03:02,008 INFO org.apache.zookeeper.ClientCnxn: EventThread > > shut down > > 2009-09-16 09:03:02,110 INFO > > org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/ > > 10.209.207.80:60020 exiting > > 2009-09-16 09:03:03,603 INFO org.apache.hadoop.hbase.Leases: > > regionserver/10.209.207.80:60020.leaseChecker closing leases > > 2009-09-16 09:03:03,603 INFO org.apache.hadoop.hbase.Leases: > > regionserver/10.209.207.80:60020.leaseChecker closed leases > > 2009-09-16 09:03:03,604 INFO > > org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown > > thread. > > 2009-09-16 09:03:03,604 INFO > > org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread > > complete > > > > > > > > > > > > > > >
