On Mon, Mar 19, 2012 at 10:41 AM, Emmanuel Lécharny <[email protected]> wrote: > Le 3/19/12 6:26 PM, Selcuk AYA a écrit : > >> On Mon, Mar 19, 2012 at 9:24 AM, Emmanuel Lécharny<[email protected]> >> wrote: >>> >>> Hi, >>> >>> I have a few questions about the handling of the log buffer. >>> >>> When we can't write anymore data in the buffer, because it's full, we try >>> to >>> flush the buffer on disk. What happens then is : >>> - if there is enough room remaining in the buffer, we write a skip record >>> (with a -1 length) : is it necessary ? (we then rewind the buffer) >>> - otherwise, we rewind the buffer >>> >>> In any case, we increment the writeAheadRewindCount : what for ? >>> >>> then we call the flush() method, which will be executed only if there is >>> no >>> other thread flushing the buffer already (just in case the sync() method >>> is >>> called by another thread). I guess this is intended to allow a thread to >>> add >>> new data in the buffer while another thread writes the buffer on disk? >>> >>> So AFAIU, only one thread will be allowed to write data into the buffer, >>> up >>> to the point it reaches a record being hold by the flush thread, and only >>> one thread can flush the data, up to the point it reaches the last record >>> it >>> can write (which is computed before the flush() method is called). >>> >>> I'm wondering if we couldn't use a simpler algorithm, where we have a >>> flush >>> thread used to flush the data in any case. If the buffer is full, we stop >>> writing until we are signaled that there is some room left (and this is >>> the >>> flush thread role to signal the writer that it can start again). That >>> means >>> we write as much as we can, signaling each record to the flush thread, >>> and >>> the flush thread will consume the record when they arrive. If both are >>> colliding (ie, no more room remains in the buffer, the reader will have >>> to >>> wait for the writer to wake it up). We won't need to use a buffer at all, >>> we >>> just pass the records (plus their headers and trailers) in queue, >>> avoiding >>> a copy in a temporary memory. >>> >>> This is basically doing the same thing, but we don't wait until the >>> buffer >>> is full to wake up the writer. This is the way the network layer works in >>> NIO, with a selector signaling the writer thread when it's ready to >>> accept >>> some more data to be written. >> >> I am confused about the buffering (or no buffering) you suggest. Are >> you suggesting a flush thread will use directly write off the user's >> buffer without any in mem copy? > > Yes. In fact, I suggest we buffer the records, without copying them. When > the flush thread is waken up (or kicked), it will write the header, the > buffer, the footer. We can use ByteBuffer gathering for that (see > http://tutorials.jenkov.com/java-nio/scatter-gather.html)
I see.But this is effectively what we are doing right? Instead of putting the buffers in a queue and doing scatter/gather through byte buffer(which will eventually do a memcpy to do a single batched write I think), we copy into an in mem buffer and let the flushing thread to do the single batched write. > >> >> Currently the things work like this on the common code path: >> >> * for user threads: >> prepare record >> get log latch >> copy in memory buffer and get LSN(logicla sequence number). >> release log latch >> return LSN >> >> >> *for background flushing thread: >> wake up periodically , reap the in memory log and write >> >> so background does not necessarily wait for buffer to be full to >> wakeup and write.In the hopefully less common case, if the buffer is >> full, a user thread will take it for the team and write the buffer(we >> could signal the flush thread as an alternative here). >> >> In the common case, this allows user threads not wait for write and >> getting an LSN quickly(LSN is important to order log records) and >> batching of writes. Similar algorithms are used for all database WAL >> code I looked at(including Apache Derby) > > I have something different in mind to get the record ordered : inject them > in a queue (as only one single writer will access the queue, the order will > be guaranteed). The flush thread will be waiting on this queue to be > modified to flush the data on disk. This queue can contain a limited number > of records, and we can check if that the record size does not exceed a > certain amount. > > In any case, the flush thread is autonomous, and can either be wakened up > when the queue has some data, or wait to be wakened up periodically, of when > the queue is full. > > Does it makes sense ? > Note : I'm not suggesting that we should change the current code, just > trying to get some thougth food for later improvement... > > > -- > Regards, > Cordialement, > Emmanuel Lécharny > www.iktek.com >
