when a datanode dies, any write pipeline that was using that datanode gets affected to a certain extent. The writer goes through an error recovery protocol that could introduce delays in the write pipeline. On the other hand, other write pipelines that do not encompass the dead datanode should not be impacted at all.
thanks dhruba On Wed, Dec 29, 2010 at 2:57 AM, Rajat Goel <rajatgoe...@gmail.com> wrote: > I am opening a new file every 5 mins. For every 5 mins, I keep writing to a > file, then I close the current file and open a new file for writing. My > block size is 256 MB. Replication factor is 2. > > This is my test scenario: I am using a cluster of 6 machines (1 namenode, 5 > datanodes). On each datanode, I am running two threads (one writing to HDFS > @ 10MB/s and other reading from HDFS @ 20 MB/s.) I shutdown one of the > datanodes manually and I see that my write thread on live datanodes is no > longer able to write @10 MB/s to HDFS, write speed becomes slow.The problem > is writes on live datanodes get affected by a datanode going dead. > > I suspect that this may be due to live nodes trying to replicate their > blocks on dead datanode. I see java.io exceptions on terminal of live > datanodes saying bad ack from the dead machine. > > Can you please tell us what how exactly writes and replication behave when > a datanode goes down? > > Regards, > Rajat > > > On Wed, Dec 29, 2010 at 11:17 AM, Dhruba Borthakur <dhr...@gmail.com>wrote: > >> how frequently do you open new files to write? Or do you continue to write >> to the same file(s) for the entire duration of the test? what is ur block >> size? can you pl elaborate on your test workload? >> >> >> On Tue, Dec 28, 2010 at 9:45 PM, Rajat Goel <rajatgoe...@gmail.com>wrote: >> >>> Hi, >>> >>> I want to measure read/write rates to HDFS under various conditions such >>> as under heavy load or one data node goes down etc? Is there some profiler >>> already available for such purpose? >>> >>> I am pushing data at high rate to HDFS, reads are also happening in >>> parallel and I suddenly reboot one datanode. I observe that I am no longer >>> able to write to HDFS (from live datanodes) at the same higher rate. This >>> happens for few minutes (around 30 mins), after which things go back to >>> normal again. I want to find out why HDFS becomes slow, what is the main >>> contributor of this latency and can I improve this behavior by changing some >>> configuration parameters. >>> >>> Thanks & Regards, >>> Rajat >>> >> >> >> >> -- >> Connect to me at http://www.facebook.com/dhruba >> > > -- Connect to me at http://www.facebook.com/dhruba