Hi Dhruba,

Can you please explain a bit more about this error recovery protocol and the
delay that could be introduced? Can we control this delay via some
configuration parameters of HDFS? I already tried setting
dfs.client.block.write.retries to 0 and
dfs.namenode.heartbeat.recheck-interval to 1000.

Thanks,
Rajat

On Mon, Jan 3, 2011 at 11:06 AM, Dhruba Borthakur <dhr...@gmail.com> wrote:

> when a datanode dies, any write pipeline that was using that datanode gets
> affected to a certain extent. The writer goes through an error recovery
> protocol that could introduce delays in the write pipeline. On the other
> hand, other write pipelines that do not encompass the dead datanode should
> not be impacted at all.
>
> thanks
> dhruba
>
>
>
> On Wed, Dec 29, 2010 at 2:57 AM, Rajat Goel <rajatgoe...@gmail.com> wrote:
>
>> I am opening a new file every 5 mins. For every 5 mins, I keep writing to
>> a file, then I close the current file and open a new file for writing. My
>> block size is 256 MB. Replication factor is 2.
>>
>> This is my test scenario: I am using a cluster of 6 machines (1 namenode,
>> 5 datanodes). On each datanode, I am running two threads (one writing to
>> HDFS @ 10MB/s and other reading from HDFS @ 20 MB/s.) I shutdown one of the
>> datanodes manually and I see that my write thread on live datanodes is no
>> longer able to write @10 MB/s to HDFS, write speed becomes slow.The problem
>> is writes on live datanodes get affected by a datanode going dead.
>>
>> I suspect that this may be due to live nodes trying to replicate their
>> blocks on dead datanode. I see java.io exceptions on terminal of live
>> datanodes saying bad ack from the dead machine.
>>
>> Can you please tell us what how exactly writes and replication behave when
>> a datanode goes down?
>>
>> Regards,
>> Rajat
>>
>>
>> On Wed, Dec 29, 2010 at 11:17 AM, Dhruba Borthakur <dhr...@gmail.com>wrote:
>>
>>> how frequently do you open new files to write? Or do you continue to
>>> write to the same file(s) for the entire duration of the test? what is ur
>>> block size? can you pl elaborate on your test workload?
>>>
>>>
>>> On Tue, Dec 28, 2010 at 9:45 PM, Rajat Goel <rajatgoe...@gmail.com>wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to measure read/write rates to HDFS under various conditions such
>>>> as under heavy load or one data node goes down etc? Is there some profiler
>>>> already available for such purpose?
>>>>
>>>> I am pushing data at high rate to HDFS, reads are also happening in
>>>> parallel and I suddenly reboot one datanode. I observe that I am no longer
>>>> able to write to HDFS (from live datanodes) at the same higher rate. This
>>>> happens for few minutes (around 30 mins), after which things go back to
>>>> normal again. I want to find out why HDFS becomes slow, what is the main
>>>> contributor of this latency and can I improve this behavior by changing 
>>>> some
>>>> configuration parameters.
>>>>
>>>> Thanks & Regards,
>>>> Rajat
>>>>
>>>
>>>
>>>
>>> --
>>> Connect to me at http://www.facebook.com/dhruba
>>>
>>
>>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>

Reply via email to