i posted on the jira as well - but we should be able to simulate the
effect of the patch.

if the sync was simulated merely a sleep (for 2-3ms - whatever is the
average RTT for dfs write pipeline) instead of an actual call into dfs
client - it should simulate the effect of the patch. (the appends
would proceed in parallel, each sync would block for sometime).

so we should be able to test whether this gets a performance win for
the queue threshold=1 case.

On Wed, Jan 13, 2010 at 10:43 AM, Dhruba Borthakur <dhr...@gmail.com> wrote:
> Awesome, I will try to post a patch soon and  will let you know as soon as I
> have the first version ready.
>
> thanks,
> dhruba
>
>
> On Wed, Jan 13, 2010 at 10:40 AM, Jean-Daniel Cryans 
> <jdcry...@apache.org>wrote:
>
>> I'll be happy to benchmark, we already have code to test the
>> multi-client hitting 1 region server case.
>>   know
>> J-D
>>
>> On Wed, Jan 13, 2010 at 10:38 AM, Dhruba Borthakur <dhr...@gmail.com>
>> wrote:
>> > I will try to make a patch for it first. depending on the complexity of
>> the
>> > patch code, we can decide which release it can go in.
>> >
>> > thanks,
>> > dhruba
>> >
>> > On Wed, Jan 13, 2010 at 9:56 AM, Jean-Daniel Cryans <jdcry...@apache.org
>> >wrote:
>> >
>> >> That's great dhruba, I guess the sooner it could go in is 0.21.1?
>> >>
>> >> J-D
>> >>
>> >> On Wed, Jan 13, 2010 at 8:51 AM, Dhruba Borthakur <dhr...@gmail.com>
>> >> wrote:
>> >> > I opened http://issues.apache.org/jira/browse/HDFS-895 for this one.
>> >> >
>> >> > thanks,
>> >> > dhruba
>> >> >
>> >> > On Tue, Jan 12, 2010 at 9:41 PM, Joydeep Sarma <jsensa...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> this is internal to the dfsclient. this would explain why performance
>> >> >> would suck with queue threshold of 1.
>> >> >>
>> >> >> leave it up to Dhruba to explain the details.
>> >> >>
>> >> >> On Tue, Jan 12, 2010 at 9:16 PM, stack <st...@duboce.net> wrote:
>> >> >> > On Tue, Jan 12, 2010 at 9:12 PM, stack <st...@duboce.net> wrote:
>> >> >> >
>> >> >> >> > any IO to a HDFS-file (appends, writes, etc) ae actually blocked
>> on
>> >> a
>> >> >> >> > pending sync. "sync" in HDFS is a pretty heavyweight operation
>> as
>> >> it
>> >> >> >> stands.
>> >> >> >>
>> >> >> >> i think this is likely to explain limited throughput with the
>> default
>> >> >> >> write queue threshold of 1. if the appends cannot make progress
>> while
>> >> >> >> one is waiting for the sync - then the write pipeline is going to
>> be
>> >> >> >> idle most of the time (with queue threshold of 1).
>> >> >> >>
>> >> >> >> i think it would be good to have the sync not block other writers
>> on
>> >> >> >> the file/pipeline. logically - it's not clear why it needs to
>> (since
>> >> >> >> the sync is just a wait for the completion as of some write
>> >> >> >> transaction id - allowing new ones to be queued up subsequently).
>> >> >> >
>> >> >> >
>> >> >> > Are you talking about internal to DFSClient Joydeep?  Or some
>> >> >> > synchronization block up in hlog?
>> >> >> >
>> >> >> > St.Ack
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Connect to me at http://www.facebook.com/dhruba
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Connect to me at http://www.facebook.com/dhruba
>> >
>>
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>

Reply via email to