Hi Dhruba, Can you please explain a bit more about this error recovery protocol and the delay that could be introduced? Can we control this delay via some configuration parameters of HDFS? I already tried setting dfs.client.block.write.retries to 0 and dfs.namenode.heartbeat.recheck-interval to 1000.
Thanks, Rajat On Mon, Jan 3, 2011 at 11:06 AM, Dhruba Borthakur <dhr...@gmail.com> wrote: > when a datanode dies, any write pipeline that was using that datanode gets > affected to a certain extent. The writer goes through an error recovery > protocol that could introduce delays in the write pipeline. On the other > hand, other write pipelines that do not encompass the dead datanode should > not be impacted at all. > > thanks > dhruba > > > > On Wed, Dec 29, 2010 at 2:57 AM, Rajat Goel <rajatgoe...@gmail.com> wrote: > >> I am opening a new file every 5 mins. For every 5 mins, I keep writing to >> a file, then I close the current file and open a new file for writing. My >> block size is 256 MB. Replication factor is 2. >> >> This is my test scenario: I am using a cluster of 6 machines (1 namenode, >> 5 datanodes). On each datanode, I am running two threads (one writing to >> HDFS @ 10MB/s and other reading from HDFS @ 20 MB/s.) I shutdown one of the >> datanodes manually and I see that my write thread on live datanodes is no >> longer able to write @10 MB/s to HDFS, write speed becomes slow.The problem >> is writes on live datanodes get affected by a datanode going dead. >> >> I suspect that this may be due to live nodes trying to replicate their >> blocks on dead datanode. I see java.io exceptions on terminal of live >> datanodes saying bad ack from the dead machine. >> >> Can you please tell us what how exactly writes and replication behave when >> a datanode goes down? >> >> Regards, >> Rajat >> >> >> On Wed, Dec 29, 2010 at 11:17 AM, Dhruba Borthakur <dhr...@gmail.com>wrote: >> >>> how frequently do you open new files to write? Or do you continue to >>> write to the same file(s) for the entire duration of the test? what is ur >>> block size? can you pl elaborate on your test workload? >>> >>> >>> On Tue, Dec 28, 2010 at 9:45 PM, Rajat Goel <rajatgoe...@gmail.com>wrote: >>> >>>> Hi, >>>> >>>> I want to measure read/write rates to HDFS under various conditions such >>>> as under heavy load or one data node goes down etc? Is there some profiler >>>> already available for such purpose? >>>> >>>> I am pushing data at high rate to HDFS, reads are also happening in >>>> parallel and I suddenly reboot one datanode. I observe that I am no longer >>>> able to write to HDFS (from live datanodes) at the same higher rate. This >>>> happens for few minutes (around 30 mins), after which things go back to >>>> normal again. I want to find out why HDFS becomes slow, what is the main >>>> contributor of this latency and can I improve this behavior by changing >>>> some >>>> configuration parameters. >>>> >>>> Thanks & Regards, >>>> Rajat >>>> >>> >>> >>> >>> -- >>> Connect to me at http://www.facebook.com/dhruba >>> >> >> > > > -- > Connect to me at http://www.facebook.com/dhruba >