My DFS appears healthy. After the PE fails, the datanodes are still
running but all the HRegionServers have exited. My initial concern is
free harddrive space or memory. Each node has ~1.5GB free space for
DFS and 400MB ram/256mb swap. Is this enough for the PE? I tried
monitoring the free space as the PE ran and it never completely filled
up but it is kind of tight.
On Nov 15, 2007 8:01 PM, stack <[EMAIL PROTECTED]> wrote:
> Your DFS is healthy? This seems odd: "File
> /tmp/hadoop-kcd/hbase/hregion_TestTable,2102165,6843477525281170954/info/mapfiles/6464987859396543981/datacould
> only be replicated to 0 nodes, instead of 1;" In my experience, IIRC,
> it means no datanodes running.
>
> (I just tried the PE from TRUNK and it ran to completion).
>
> St.Ack
>
>
> Kareem Dana wrote:
> > I'm trying to run the HBase PerformanceEvaluation program on a cluster
> > of 5 hadoop nodes (on virtual machines).
> >
> > hadoop07 is a DFS Master and HBase master
> > hadoop08-12 are HBase region servers
> >
> > I start the test as follows:
> >
> > $ bin/hadoop jar
> > ${HADOOP_HOME}build/contrib/hbase/hadoop-0.15.0-dev-hbase-test.jar
> > sequentialWrite 2
> >
> > This starts the sequentialWrite test with 2 clients. After about 25
> > minutes the map tasks are about 25% complete and reduce at 6% the test
> > fails with the following error:
> > 2007-11-15 17:06:35,100 INFO org.apache.hadoop.mapred.TaskInProgress:
> > TaskInProgress tip_200711151626_0001_m_000002 has failed 1 times.
> > 2007-11-15 17:06:35,100 INFO org.apache.hadoop.mapred.JobInProgress:
> > Aborting job job_200711151626_0001
> > 2007-11-15 17:06:35,101 INFO org.apache.hadoop.mapred.TaskInProgress:
> > Error from task_200711151626_0001_m_000006_0:
> > org.apache.hadoop.hbase.NoServerForRegionException: failed to find
> > server for TestTable after 5 retries
> > at
> > org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:761)
> > at
> > org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:521)
> > at
> > org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:317)
> > at org.apache.hadoop.hbase.HTable.commit(HTable.java:671)
> > at org.apache.hadoop.hbase.HTable.commit(HTable.java:636)
> > at
> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest.testRow(PerformanceEvaluation.java:493)
> > at
> > org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:356)
> > at
> > org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:529)
> > at
> > org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:184)
> > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
> > at
> > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
> >
> >
> > An HBase region server log shows these errors:
> > 2007-11-15 17:03:00,017 ERROR org.apache.hadoop.hbase.HRegionServer:
> > error closing region TestTable,2102165,6843477525281170954
> > org.apache.hadoop.hbase.DroppedSnapshotException: java.io.IOException:
> > File
> > /tmp/hadoop-kcd/hbase/hregion_TestTable,2102165,6843477525281170954/info/mapfiles/6464987859396543981/data
> > could only be replicated to 0 nodes, instead of 1
> > at
> > org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1003)
> > at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:293)
> > at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> > at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:585)
> > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> >
> > at
> > org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:886)
> > at org.apache.hadoop.hbase.HRegion.close(HRegion.java:388)
> > at
> > org.apache.hadoop.hbase.HRegionServer.closeAllRegions(HRegionServer.java:978)
> > at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:593)
> > at java.lang.Thread.run(Thread.java:595)
> > 2007-11-15 17:03:00,615 ERROR org.apache.hadoop.hbase.HRegionServer:
> > error closing region TestTable,3147654,8929124532081908894
> > org.apache.hadoop.hbase.DroppedSnapshotException: java.io.IOException:
> > File
> > /tmp/hadoop-kcd/hbase/hregion_TestTable,3147654,8929124532081908894/info/mapfiles/3451857497397493742/data
> > could only be replicated to 0 nodes, instead of 1
> > at
> > org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1003)
> > at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:293)
> > at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> > at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:585)
> > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> >
> > at
> > org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:886)
> > at org.apache.hadoop.hbase.HRegion.close(HRegion.java:388)
> > at
> > org.apache.hadoop.hbase.HRegionServer.closeAllRegions(HRegionServer.java:978)
> > at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:593)
> > at java.lang.Thread.run(Thread.java:595)
> > 2007-11-15 17:03:00,639 ERROR org.apache.hadoop.hbase.HRegionServer:
> > Close and delete failed
> > java.io.IOException: java.io.IOException: File
> > /tmp/hadoop-kcd/hbase/log_172.16.6.57_-3889232888673408171_60020/hlog.dat.005
> > could only be replicated to 0 nodes, instead of 1
> > at
> > org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1003)
> > at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:293)
> > at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> > at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:585)
> > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> >
> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > Method)
> > at
> > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> > at
> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> > at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
> > at
> > org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
> > at
> > org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:48)
> > at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:597)
> > at java.lang.Thread.run(Thread.java:595)
> > 2007-11-15 17:03:00,640 INFO org.apache.hadoop.hbase.HRegionServer:
> > telling master that region server is shutting down at:
> > 172.16.6.57:60020
> > 2007-11-15 17:03:00,643 INFO org.apache.hadoop.hbase.HRegionServer:
> > stopping server at: 172.16.6.57:60020
> > 2007-11-15 17:03:00,643 INFO org.apache.hadoop.hbase.HRegionServer:
> > regionserver/0.0.0.0:60020 exiting
> >
> > I can provide some more logs if necessary. Any ideas or suggestions
> > about how I track this down? Running sequentialWrite test with just 1
> > client works fine but using 2 or more causes these errors.
> >
> > Thanks for any help,
> > Kareem Dana
> >
>
>