Additionally, in a blocking context, if a thread misses the page cache and
needs to go to the device itself, it might block other operations that
-are- in page cache... using more threads helps there, although it is not a
complete solution.

On Tue, Sep 25, 2012 at 3:33 PM, Stu Hood <[email protected]> wrote:

> If with one thread we are getting pretty high latency (case Aniruddha
>> described), doesn't it mean we have a number of requests queued up?
>
> If you have 1 thread in the system and you're doing blocking IO, then the
> os/hardware can only possibly know about 1 IO operation at a time (because
> the 1 thread can't send the next operation while it is blocked waiting for
> the first operation.) If instead you have 16 threads issuing IO to one
> physical device, then there can be 16 outstanding IO operations at once.
>
> An "outstanding" IO operation might be queued a) in the kernel, via an IO
> scheduler [0], b) at a raid controller, c) in the device itself [1]. All of
> these layers implement reordering of operations: and the more queueing, the
> more chance to optimize access order.
>
> One of the simplest exposed metrics is '*avgqu-sz*', exposed by linux
> `iostat -x`: it is the average number of queued operations for a device.
>
> [0] for example, CFQ: http://en.wikipedia.org/wiki/CFQ
> [1] NCQ: http://en.wikipedia.org/wiki/Native_Command_Queuing
>
>
>
> On Tue, Sep 25, 2012 at 2:06 PM, Flavio Junqueira <[email protected]>wrote:
>
>> Hi Stu, I'm not sure I understand your point. If with one thread we are
>> getting pretty high latency (case Aniruddha described), doesn't it mean we
>> have a number of requests queued up? Adding more threads might only make
>> the problem worse by queueing up even more requests. I'm possibly missing
>> your point...
>>
>> -Flavio
>>
>> On Sep 25, 2012, at 9:37 PM, Stu Hood wrote:
>>
>> > Separating by device would help, but will not allow the devices to be
>> fully
>> > utilized: in order to buffer enough io commands into a disk's queue for
>> the
>> > elevator algorithms to kick in, you either need to use multiple threads
>> per
>> > disk, or native async IO (not trivially available within the JVM.)
>> >
>> > On Tue, Sep 25, 2012 at 2:23 AM, Flavio Junqueira <[email protected]>
>> wrote:
>> >
>> >>
>> >> On Sep 25, 2012, at 10:55 AM, Aniruddha Laud wrote:
>> >>
>> >>> On Tue, Sep 25, 2012 at 1:35 AM, Flavio Junqueira <[email protected]>
>> >> wrote:
>> >>>
>> >>>> Just to add a couple of comments to the discussion, separating reads
>> and
>> >>>> writes into different threads should only help with queuing latency.
>> It
>> >>>> wouldn't help with IO latency.
>> >>>>
>> >>>
>> >>> Yes, but with the current implementation, publishes latencies in
>> hedwig
>> >>> suffer because of lagging subscribers. By separating read and write
>> >> queues,
>> >>> we can at least guarantee that the write SLA is maintained (separate
>> >>> journal disk + separate thread would ensure that writes are not
>> affected
>> >> by
>> >>> read related seeks)
>> >>>
>> >>
>> >> Agreed and based on my comment below, I was wondering if it wouldn't be
>> >> best to separate traffic across threads by device instead of by
>> operation
>> >> type.
>> >>
>> >>
>> >>>>
>> >>>> Also, it sounds like a good idea to have at least one thread per
>> ledger
>> >>>> device. In the case of multiple ledger devices, if we use one single
>> >>>> thread, then the performance of the bookie will be driven by the
>> slowest
>> >>>> disk, no?
>> >>>>
>> >>> yup, makes sense.
>> >>>
>> >>>>
>> >>>> -Flavio
>> >>>>
>> >>>> On Sep 25, 2012, at 10:24 AM, Ivan Kelly wrote:
>> >>>>
>> >>>>>> Could you give some information on what those shortcomings are?
>> Also,
>> >> do
>> >>>>>> let me know if you need any more information from our end.
>> >>>>> Off the top of my head:
>> >>>>> - reads and writes are handled in the same thread (as you have
>> >> observed)
>> >>>>> - each entry read requires a single RPC.
>> >>>>> - entries are read in parallel
>> >>>>
>> >>> By parallel, you mean the BufferedChannel wrapper on top of
>> FileChannel,
>> >>> right?
>> >>>
>> >>>>>
>> >>>>> Not all of these could result in the high latency you see, but if
>> each
>> >>>>> entry is being read separately, a sync on the ledger disk in between
>> >>>>> will make a mess of the disk head scheduling.
>> >>>>
>> >>> Increasing the time interval between  flushing log files might
>> possibly
>> >>> help in this case then?
>> >>>
>> >>>>>
>> >>>>> -Ivan
>> >>>>
>> >>>>
>> >>> Thanks for the help :)
>> >>
>> >>
>>
>>
>

Reply via email to