Your DFS is healthy? This seems odd: "File /tmp/hadoop-kcd/hbase/hregion_TestTable,2102165,6843477525281170954/info/mapfiles/6464987859396543981/datacould only be replicated to 0 nodes, instead of 1;" In my experience, IIRC, it means no datanodes running.

(I just tried the PE from TRUNK and it ran to completion).

St.Ack

Kareem Dana wrote:
I'm trying to run the HBase PerformanceEvaluation program on a cluster
of 5 hadoop nodes (on virtual machines).

hadoop07 is a DFS Master and HBase master
hadoop08-12 are HBase region servers

I start the test as follows:

$ bin/hadoop jar
${HADOOP_HOME}build/contrib/hbase/hadoop-0.15.0-dev-hbase-test.jar
sequentialWrite 2

This starts the sequentialWrite test with 2 clients. After about 25
minutes the map tasks are about 25% complete and reduce at 6% the test
fails with the following error:
2007-11-15 17:06:35,100 INFO org.apache.hadoop.mapred.TaskInProgress:
TaskInProgress tip_200711151626_0001_m_000002 has failed 1 times.
2007-11-15 17:06:35,100 INFO org.apache.hadoop.mapred.JobInProgress:
Aborting job job_200711151626_0001
2007-11-15 17:06:35,101 INFO org.apache.hadoop.mapred.TaskInProgress:
Error from task_200711151626_0001_m_000006_0:
org.apache.hadoop.hbase.NoServerForRegionException: failed to find
server for TestTable after 5 retries
        at 
org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:761)
        at 
org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:521)
        at 
org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:317)
        at org.apache.hadoop.hbase.HTable.commit(HTable.java:671)
        at org.apache.hadoop.hbase.HTable.commit(HTable.java:636)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest.testRow(PerformanceEvaluation.java:493)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:356)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:529)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:184)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
        

An HBase region server log shows these errors:
2007-11-15 17:03:00,017 ERROR org.apache.hadoop.hbase.HRegionServer:
error closing region TestTable,2102165,6843477525281170954
org.apache.hadoop.hbase.DroppedSnapshotException: java.io.IOException:
File 
/tmp/hadoop-kcd/hbase/hregion_TestTable,2102165,6843477525281170954/info/mapfiles/6464987859396543981/data
could only be replicated to 0 nodes, instead of 1
        at 
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1003)
        at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:293)
        at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

        at org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:886)
        at org.apache.hadoop.hbase.HRegion.close(HRegion.java:388)
        at 
org.apache.hadoop.hbase.HRegionServer.closeAllRegions(HRegionServer.java:978)
        at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:593)
        at java.lang.Thread.run(Thread.java:595)
2007-11-15 17:03:00,615 ERROR org.apache.hadoop.hbase.HRegionServer:
error closing region TestTable,3147654,8929124532081908894
org.apache.hadoop.hbase.DroppedSnapshotException: java.io.IOException:
File 
/tmp/hadoop-kcd/hbase/hregion_TestTable,3147654,8929124532081908894/info/mapfiles/3451857497397493742/data
could only be replicated to 0 nodes, instead of 1
        at 
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1003)
        at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:293)
        at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

        at org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:886)
        at org.apache.hadoop.hbase.HRegion.close(HRegion.java:388)
        at 
org.apache.hadoop.hbase.HRegionServer.closeAllRegions(HRegionServer.java:978)
        at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:593)
        at java.lang.Thread.run(Thread.java:595)
2007-11-15 17:03:00,639 ERROR org.apache.hadoop.hbase.HRegionServer:
Close and delete failed
java.io.IOException: java.io.IOException: File
/tmp/hadoop-kcd/hbase/log_172.16.6.57_-3889232888673408171_60020/hlog.dat.005
could only be replicated to 0 nodes, instead of 1
        at 
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1003)
        at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:293)
        at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
        at 
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
        at 
org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:48)
        at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:597)
        at java.lang.Thread.run(Thread.java:595)
2007-11-15 17:03:00,640 INFO org.apache.hadoop.hbase.HRegionServer:
telling master that region server is shutting down at:
172.16.6.57:60020
2007-11-15 17:03:00,643 INFO org.apache.hadoop.hbase.HRegionServer:
stopping server at: 172.16.6.57:60020
2007-11-15 17:03:00,643 INFO org.apache.hadoop.hbase.HRegionServer:
regionserver/0.0.0.0:60020 exiting

I can provide some more logs if necessary. Any ideas or suggestions
about how I track this down? Running sequentialWrite test with just 1
client works fine but using 2 or more causes these errors.

Thanks for any help,
Kareem Dana

Reply via email to