This is an interesting observation overall. Most of our focus has been on the 
writes, which are mostly sequential. For sequential writes, I would think that 
introducing parallelism does not make sense. Parallelism is mostly beneficial 
when we have have random seeks, which we have upon reads to multiple ledgers. 
When read traffic increases and this traffic hits many ledgers, it sounds right 
that we need a good way to introduce parallelism so that we can improve the 
utilization of disk devices. 

If we are to use multiple threads to increase the degree of parallelism, I was 
wondering how many threads we would actually need. Any clue? Does it depend on 
the number of ledgers we are concurrently accessing?

-Flavio

On Sep 26, 2012, at 12:33 AM, Stu Hood wrote:

>> 
>> If with one thread we are getting pretty high latency (case Aniruddha
>> described), doesn't it mean we have a number of requests queued up?
> 
> If you have 1 thread in the system and you're doing blocking IO, then the
> os/hardware can only possibly know about 1 IO operation at a time (because
> the 1 thread can't send the next operation while it is blocked waiting for
> the first operation.) If instead you have 16 threads issuing IO to one
> physical device, then there can be 16 outstanding IO operations at once.
> 
> An "outstanding" IO operation might be queued a) in the kernel, via an IO
> scheduler [0], b) at a raid controller, c) in the device itself [1]. All of
> these layers implement reordering of operations: and the more queueing, the
> more chance to optimize access order.
> 
> One of the simplest exposed metrics is '*avgqu-sz*', exposed by linux
> `iostat -x`: it is the average number of queued operations for a device.
> 
> [0] for example, CFQ: http://en.wikipedia.org/wiki/CFQ
> [1] NCQ: http://en.wikipedia.org/wiki/Native_Command_Queuing
> 
> 
> 
> On Tue, Sep 25, 2012 at 2:06 PM, Flavio Junqueira <[email protected]> wrote:
> 
>> Hi Stu, I'm not sure I understand your point. If with one thread we are
>> getting pretty high latency (case Aniruddha described), doesn't it mean we
>> have a number of requests queued up? Adding more threads might only make
>> the problem worse by queueing up even more requests. I'm possibly missing
>> your point...
>> 
>> -Flavio
>> 
>> On Sep 25, 2012, at 9:37 PM, Stu Hood wrote:
>> 
>>> Separating by device would help, but will not allow the devices to be
>> fully
>>> utilized: in order to buffer enough io commands into a disk's queue for
>> the
>>> elevator algorithms to kick in, you either need to use multiple threads
>> per
>>> disk, or native async IO (not trivially available within the JVM.)
>>> 
>>> On Tue, Sep 25, 2012 at 2:23 AM, Flavio Junqueira <[email protected]>
>> wrote:
>>> 
>>>> 
>>>> On Sep 25, 2012, at 10:55 AM, Aniruddha Laud wrote:
>>>> 
>>>>> On Tue, Sep 25, 2012 at 1:35 AM, Flavio Junqueira <[email protected]>
>>>> wrote:
>>>>> 
>>>>>> Just to add a couple of comments to the discussion, separating reads
>> and
>>>>>> writes into different threads should only help with queuing latency.
>> It
>>>>>> wouldn't help with IO latency.
>>>>>> 
>>>>> 
>>>>> Yes, but with the current implementation, publishes latencies in hedwig
>>>>> suffer because of lagging subscribers. By separating read and write
>>>> queues,
>>>>> we can at least guarantee that the write SLA is maintained (separate
>>>>> journal disk + separate thread would ensure that writes are not
>> affected
>>>> by
>>>>> read related seeks)
>>>>> 
>>>> 
>>>> Agreed and based on my comment below, I was wondering if it wouldn't be
>>>> best to separate traffic across threads by device instead of by
>> operation
>>>> type.
>>>> 
>>>> 
>>>>>> 
>>>>>> Also, it sounds like a good idea to have at least one thread per
>> ledger
>>>>>> device. In the case of multiple ledger devices, if we use one single
>>>>>> thread, then the performance of the bookie will be driven by the
>> slowest
>>>>>> disk, no?
>>>>>> 
>>>>> yup, makes sense.
>>>>> 
>>>>>> 
>>>>>> -Flavio
>>>>>> 
>>>>>> On Sep 25, 2012, at 10:24 AM, Ivan Kelly wrote:
>>>>>> 
>>>>>>>> Could you give some information on what those shortcomings are?
>> Also,
>>>> do
>>>>>>>> let me know if you need any more information from our end.
>>>>>>> Off the top of my head:
>>>>>>> - reads and writes are handled in the same thread (as you have
>>>> observed)
>>>>>>> - each entry read requires a single RPC.
>>>>>>> - entries are read in parallel
>>>>>> 
>>>>> By parallel, you mean the BufferedChannel wrapper on top of
>> FileChannel,
>>>>> right?
>>>>> 
>>>>>>> 
>>>>>>> Not all of these could result in the high latency you see, but if
>> each
>>>>>>> entry is being read separately, a sync on the ledger disk in between
>>>>>>> will make a mess of the disk head scheduling.
>>>>>> 
>>>>> Increasing the time interval between  flushing log files might possibly
>>>>> help in this case then?
>>>>> 
>>>>>>> 
>>>>>>> -Ivan
>>>>>> 
>>>>>> 
>>>>> Thanks for the help :)
>>>> 
>>>> 
>> 
>> 

Reply via email to