Hello, I am running org.apache.hadoop.fs.TestDFSIO to benchmark our HDFS installation and had a couple of questions regarding the same.
a) If I run the benchmark back to back in the same directory, I start seeing strange errors such as NotReplicatedYetException or AlreadyBeingCreatedException (failed to create file .... on client 5, because this file is already being created by DFSClient_.... on ...). It seems like there might be some kind of race condition between the replication from a previous run and subsequent runs. Is there any way to avoid this? b) I have been testing with concurrent writers and see a significant drop in throughput. I get about 60 MB/s for 1 writer and about 8 MB/s for 50 concurrent writers. Is this the known scalability limits for HDFS. Is there any way to configure this to perform better? thanks LR
