Additionally, in a blocking context, if a thread misses the page cache and needs to go to the device itself, it might block other operations that -are- in page cache... using more threads helps there, although it is not a complete solution.
On Tue, Sep 25, 2012 at 3:33 PM, Stu Hood <[email protected]> wrote: > If with one thread we are getting pretty high latency (case Aniruddha >> described), doesn't it mean we have a number of requests queued up? > > If you have 1 thread in the system and you're doing blocking IO, then the > os/hardware can only possibly know about 1 IO operation at a time (because > the 1 thread can't send the next operation while it is blocked waiting for > the first operation.) If instead you have 16 threads issuing IO to one > physical device, then there can be 16 outstanding IO operations at once. > > An "outstanding" IO operation might be queued a) in the kernel, via an IO > scheduler [0], b) at a raid controller, c) in the device itself [1]. All of > these layers implement reordering of operations: and the more queueing, the > more chance to optimize access order. > > One of the simplest exposed metrics is '*avgqu-sz*', exposed by linux > `iostat -x`: it is the average number of queued operations for a device. > > [0] for example, CFQ: http://en.wikipedia.org/wiki/CFQ > [1] NCQ: http://en.wikipedia.org/wiki/Native_Command_Queuing > > > > On Tue, Sep 25, 2012 at 2:06 PM, Flavio Junqueira <[email protected]>wrote: > >> Hi Stu, I'm not sure I understand your point. If with one thread we are >> getting pretty high latency (case Aniruddha described), doesn't it mean we >> have a number of requests queued up? Adding more threads might only make >> the problem worse by queueing up even more requests. I'm possibly missing >> your point... >> >> -Flavio >> >> On Sep 25, 2012, at 9:37 PM, Stu Hood wrote: >> >> > Separating by device would help, but will not allow the devices to be >> fully >> > utilized: in order to buffer enough io commands into a disk's queue for >> the >> > elevator algorithms to kick in, you either need to use multiple threads >> per >> > disk, or native async IO (not trivially available within the JVM.) >> > >> > On Tue, Sep 25, 2012 at 2:23 AM, Flavio Junqueira <[email protected]> >> wrote: >> > >> >> >> >> On Sep 25, 2012, at 10:55 AM, Aniruddha Laud wrote: >> >> >> >>> On Tue, Sep 25, 2012 at 1:35 AM, Flavio Junqueira <[email protected]> >> >> wrote: >> >>> >> >>>> Just to add a couple of comments to the discussion, separating reads >> and >> >>>> writes into different threads should only help with queuing latency. >> It >> >>>> wouldn't help with IO latency. >> >>>> >> >>> >> >>> Yes, but with the current implementation, publishes latencies in >> hedwig >> >>> suffer because of lagging subscribers. By separating read and write >> >> queues, >> >>> we can at least guarantee that the write SLA is maintained (separate >> >>> journal disk + separate thread would ensure that writes are not >> affected >> >> by >> >>> read related seeks) >> >>> >> >> >> >> Agreed and based on my comment below, I was wondering if it wouldn't be >> >> best to separate traffic across threads by device instead of by >> operation >> >> type. >> >> >> >> >> >>>> >> >>>> Also, it sounds like a good idea to have at least one thread per >> ledger >> >>>> device. In the case of multiple ledger devices, if we use one single >> >>>> thread, then the performance of the bookie will be driven by the >> slowest >> >>>> disk, no? >> >>>> >> >>> yup, makes sense. >> >>> >> >>>> >> >>>> -Flavio >> >>>> >> >>>> On Sep 25, 2012, at 10:24 AM, Ivan Kelly wrote: >> >>>> >> >>>>>> Could you give some information on what those shortcomings are? >> Also, >> >> do >> >>>>>> let me know if you need any more information from our end. >> >>>>> Off the top of my head: >> >>>>> - reads and writes are handled in the same thread (as you have >> >> observed) >> >>>>> - each entry read requires a single RPC. >> >>>>> - entries are read in parallel >> >>>> >> >>> By parallel, you mean the BufferedChannel wrapper on top of >> FileChannel, >> >>> right? >> >>> >> >>>>> >> >>>>> Not all of these could result in the high latency you see, but if >> each >> >>>>> entry is being read separately, a sync on the ledger disk in between >> >>>>> will make a mess of the disk head scheduling. >> >>>> >> >>> Increasing the time interval between flushing log files might >> possibly >> >>> help in this case then? >> >>> >> >>>>> >> >>>>> -Ivan >> >>>> >> >>>> >> >>> Thanks for the help :) >> >> >> >> >> >> >
