I don't understand if it is expected that the devices are idle while we have a number of requests queued. Not being able to leverage the optimizations the disk device implement doesn't mean that devices should be idle, it simply means that that we not giving choices to the devices to select from. If we have enough requests on the bookie side, I would expect devices to be fully utilized, but perhaps not efficiently utilized.
-Flavio On Sep 26, 2012, at 2:06 AM, Sijie Guo wrote: > For serving requests, either queuing the requests in bookie server per > channel (write/read are blocking operations), or queueing in os kernel to > let block device queuing and schedule those io requests. I think Stu's > point is to leverage block device's schedule algorithm to issue io requests > in multiple threads to fully utilize the disk bandwidth. > > from the iostat reports provided by Aniruddha, the average queue length and > utilized percentage are not high, which means most of time the disks are > idle. It makes sense to use multiple threads to issue read requests. one > write thread and several read threads might work for each device. > > On Wed, Sep 26, 2012 at 5:06 AM, Flavio Junqueira <[email protected]> wrote: > >> Hi Stu, I'm not sure I understand your point. If with one thread we are >> getting pretty high latency (case Aniruddha described), doesn't it mean we >> have a number of requests queued up? Adding more threads might only make >> the problem worse by queueing up even more requests. I'm possibly missing >> your point... >> >> -Flavio >> >> On Sep 25, 2012, at 9:37 PM, Stu Hood wrote: >> >>> Separating by device would help, but will not allow the devices to be >> fully >>> utilized: in order to buffer enough io commands into a disk's queue for >> the >>> elevator algorithms to kick in, you either need to use multiple threads >> per >>> disk, or native async IO (not trivially available within the JVM.) >>> >>> On Tue, Sep 25, 2012 at 2:23 AM, Flavio Junqueira <[email protected]> >> wrote: >>> >>>> >>>> On Sep 25, 2012, at 10:55 AM, Aniruddha Laud wrote: >>>> >>>>> On Tue, Sep 25, 2012 at 1:35 AM, Flavio Junqueira <[email protected]> >>>> wrote: >>>>> >>>>>> Just to add a couple of comments to the discussion, separating reads >> and >>>>>> writes into different threads should only help with queuing latency. >> It >>>>>> wouldn't help with IO latency. >>>>>> >>>>> >>>>> Yes, but with the current implementation, publishes latencies in hedwig >>>>> suffer because of lagging subscribers. By separating read and write >>>> queues, >>>>> we can at least guarantee that the write SLA is maintained (separate >>>>> journal disk + separate thread would ensure that writes are not >> affected >>>> by >>>>> read related seeks) >>>>> >>>> >>>> Agreed and based on my comment below, I was wondering if it wouldn't be >>>> best to separate traffic across threads by device instead of by >> operation >>>> type. >>>> >>>> >>>>>> >>>>>> Also, it sounds like a good idea to have at least one thread per >> ledger >>>>>> device. In the case of multiple ledger devices, if we use one single >>>>>> thread, then the performance of the bookie will be driven by the >> slowest >>>>>> disk, no? >>>>>> >>>>> yup, makes sense. >>>>> >>>>>> >>>>>> -Flavio >>>>>> >>>>>> On Sep 25, 2012, at 10:24 AM, Ivan Kelly wrote: >>>>>> >>>>>>>> Could you give some information on what those shortcomings are? >> Also, >>>> do >>>>>>>> let me know if you need any more information from our end. >>>>>>> Off the top of my head: >>>>>>> - reads and writes are handled in the same thread (as you have >>>> observed) >>>>>>> - each entry read requires a single RPC. >>>>>>> - entries are read in parallel >>>>>> >>>>> By parallel, you mean the BufferedChannel wrapper on top of >> FileChannel, >>>>> right? >>>>> >>>>>>> >>>>>>> Not all of these could result in the high latency you see, but if >> each >>>>>>> entry is being read separately, a sync on the ledger disk in between >>>>>>> will make a mess of the disk head scheduling. >>>>>> >>>>> Increasing the time interval between flushing log files might possibly >>>>> help in this case then? >>>>> >>>>>>> >>>>>>> -Ivan >>>>>> >>>>>> >>>>> Thanks for the help :) >>>> >>>> >> >>
