reddycharan commented on issue #570: Multiple active entrylogs
   **Design Overview** 
   **Entrylog per Ledger**
   As name suggests, in this approach in a bookie we are going to have active 
entrylog dedicated to active ledger. But it is not a strict enforcement of 
one-to-one mapping of ledger to entrylog in a bookie. Strict one-to-one 
relationship between ledger and entrylog in a bookie is not possible for 
multiple reasons (ledgerdir might become full, entrylog might reach its 
capacity, segmentation/replication can happen, possibility of bookie crash,..). 
Besides, once  entrylog is rotated  it cannt be reopened. Since while rotating 
entrylog file, EntryLogger appends the ledger map at the end of the entry log 
and updates the entrylog file  with the offset and size of the map. It is like 
sealing the entrylog file. And EntryLogger maintains a pointer called 
'leastUnflushedLogId', which specifies the least  entrylogid which is not yet 
rotated and closed and GC considers all the entrylogs with logid lesser than 
'leastUnflushedLogId' (entrylogids are sequential numbers) are eligible for 
compaction/garbagecollection. In summary once the entrylog is rotated and 
closed we need to maintain immutable semantics on entrylog file.
    So instead we can provide relaxed constraint where an entrylog is 
dedicated/committed to a ledger but not otherway around. So in most cases there 
would be just one entrylog for a ledger in a bookie, but in situations like 
when entrylog reaches capacity it is rotated and new one is created for that 
ledger, when ledgerdir is full all the entrylogs in that ledgerdir are rotated 
and new ones are created for those ledgers, because of segmentation and 
replication various segments of ledger might end up in different entrylogs and 
because of a bookie crash while replaying the journal new entrylog will be 
created for the leftout entries in the journal., entries of a ledger might end 
up in different entrylogs in a bookie. To summarize briefly about this approach.
   - is to have server configuration specifying entrylogperledger is enabled
   - for the previous behavior (one active entrylog) that config can be set to 
   - when entrylogger receives addEntry call, it needs to know the entry log 
for the current ledger
   -  so EntryLogger needs to maintain state information of mapping of ledgerId 
to entrylogid. If the in-memory map doesn't contain entry for the ledger, then 
EntryLogger will create a new Entrylog and add the mapping of ledgerId to 
   - for creation of new entrylog, EntryLogger will pick writable ledgerdir 
with least number of active entrylogs
   - if entrylog reaches the capacity, then it will be rotated and new entrylog 
will be assigned to that ledger and mapping will be updated
   - If a ledgerdir becomes full, then all the entrylogs in that ledgerdir, 
should be rotated. New EntryLogs should be created in the available writable 
ledgerdirs for those ledgers and the mapping should be updated. 
   - when ledgerdir becomes writable again that ledgerdir should become 
eligible for creation of new entrylogs
   - Currently Bookie is not informed about the writeclose of the ledger, so 
there needs to be a way to know when to remove the mapping entry from the map 
and rotate the entrylog. One simple way to handle it is to use cache (Guava 
Cache library) with timebased eviction policy (on last access) and as part of 
removal listener we can rotate the corresponding entrylog.
   - Time based eviction policy is simple to provide, but untill entrylog file 
is rotated and flushed, filehandles of entrylogs are kept open and it wont 
become eligible for compaction/garbage collection. So explicit writeclose call 
from client to bookies ensemble of that ledger is needed for better handling of 
   - Both the time based eviction and removal policy and explicit writeclose 
call are required because not in all cases explicit write close calls to 
bookies are guaranteed, like during ensemble change of ledger, client crash and 
unreliable write close protocol. Advisory Write Close is explained below in 
   - For this feature I need to make changes to checkpoint logic. Currently 
with BOOKKEEPER-564 change, we are scheduling checkpoint only when current 
entrylog file is rotated. So we dont call 'flushCurrentLog' when we checkpoint. 
But for this feature, since there are going to be multiple active entrylogs, 
scheduling checkpoint when entrylog file is rotated, is not an option. So I 
need to call flushCurrentLogs when checkpoint is made for every 'flushinterval' 
   - With entrylogperledger feature we are not changing format of the entrylog 
contents in anyway, so it should be possible to switch entrylogperledger 
configuration back and forth.
   **Advisory Write Close implementation details**
   - Advisory write close message should be sent to all the bookies of the 
current ensemble when ledger is write closed or recover opened.
   - Client operation will not wait for the callback of its call.
   - the callback of the advisory close operation is going to be just logger. 
It just logs message if it is success or log error in case of any error.
   - bookie should communicate the ledger close message to entrylogger and it 
should store that message in memory datastructure.
   - when the next checkpoint happens after flushing all the entries of the 
ledger to the corresponding entrylog, then it should use the close signal to 
rotate the corresponding entrylog.

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

Reply via email to