Hi, I am trying to benchmark my HDFS cluster for I/O performance and wanted to use dfsthroughput benchmark (packed with hadoop). I could not find much help for using this on web, and had to go through the code to understand that it compares read/write performance of the local file system, raw local file system and HDFS. When I try to run this (for 2GB data here), I get file permission error :
hduser@XXXX:/work/hadoop-localhost$ bin/hadoop jar hadoop-test-1.0.1.jar dfsthroughput -Dmapred.temp.dir='/work/hadoop-datastore/hadoop-hduser1' -Ddfsthroughput.file.size=2368709120 -Ddfs.datanode.data.dir.perm=755 1 Local = /work/hadoop-datastore/hadoop-hduser1 Writing local time: 10 Reading local time: 1 Writing raw time: 10 Reading raw time: 7 Writing checked time: 11 Reading checked time: 8 Generating host names for datanodes Starting DataNode 0 with dfs.data.dir: /work/hadoop-datastore/hadoop-hduser1/dfs/data/data1,/work/hadoop-datastore/hadoop-hduser1/dfs/data/data2 Starting DataNode 0 with hostname set to: host0.foo.com Adding node with hostname : host0.foo.com to rack /foo 12/05/04 02:11:08 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 12/05/04 02:11:08 INFO impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 12/05/04 02:11:08 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 12/05/04 02:11:08 INFO impl.MetricsSystemImpl: NameNode metrics system started 12/05/04 02:11:08 INFO impl.MetricsSourceAdapter: MBean for source ugi registered. 12/05/04 02:11:08 WARN impl.MetricsSystemImpl: Source name ugi already exists! 12/05/04 02:11:08 INFO impl.MetricsSourceAdapter: MBean for source jvm registered. 12/05/04 02:11:08 INFO impl.MetricsSourceAdapter: MBean for source NameNode registered. 12/05/04 02:11:10 INFO impl.MetricsSourceAdapter: MBean for source FSNamesystemMetrics registered. 12/05/04 02:11:10 INFO impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort60328 registered. 12/05/04 02:11:10 INFO impl.MetricsSourceAdapter: MBean for source RpcActivityForPort60328 registered. 12/05/04 02:11:10 WARN impl.MetricsSystemImpl: NameNode metrics system already initialized! 12/05/04 02:11:10 WARN impl.MetricsSystemImpl: Source name ugi already exists! 12/05/04 02:11:10 WARN datanode.DataNode: Invalid directory in dfs.data.dir: Incorrect permission for /work/hadoop-datastore/hadoop-hduser1/dfs/data/data1, expected: rwxr-xr-x, while actual: rwxrwxr-x 12/05/04 02:11:10 WARN datanode.DataNode: Invalid directory in dfs.data.dir: Incorrect permission for /work/hadoop-datastore/hadoop-hduser1/dfs/data/data2, expected: rwxr-xr-x, while actual: rwxrwxr-x 12/05/04 02:11:10 ERROR datanode.DataNode: All directories in dfs.data.dir are invalid. java.lang.NullPointerException at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:422) at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:280) at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:124) at org.apache.hadoop.hdfs.BenchmarkThroughput.run(BenchmarkThroughput.java:209) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hdfs.BenchmarkThroughput.main(BenchmarkThroughput.java:229) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:81) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Looks like dfs.datanode.data.dir.perm=755 does not help here (even tried putting it in hdfs-site.xml together with command line argument), as HDFS still creates data1 and data2 with 775. I have checked ownership and permission of dfs.data.dir but still somehow new files are being generated withh77 permission. Am I missing something here ? I have seen similar error in HBASE testrun ( https://issues.apache.org/jira/browse/HBASE-5711). Has anybody else faced the same problem ? Also, I see one another benchmark for DFS calles S-live .. but i dont see it in hadoop-*-test*.jar. Can someone help me to locate and run the S-live benchmark. Thanks, Akshay