reddycharan commented on issue #570: Multiple active entrylogs
**Entrylog per Ledger**
As name suggests, in this approach in a bookie we are going to have active
entrylog dedicated to active ledger. But it is not a strict enforcement of
one-to-one mapping of ledger to entrylog in a bookie. Strict one-to-one
relationship between ledger and entrylog in a bookie is not possible for
multiple reasons (ledgerdir might become full, entrylog might reach its
capacity, segmentation/replication can happen, possibility of bookie crash,..).
Besides, once entrylog is rotated it cannt be reopened. Since while rotating
entrylog file, EntryLogger appends the ledger map at the end of the entry log
and updates the entrylog file with the offset and size of the map. It is like
sealing the entrylog file. And EntryLogger maintains a pointer called
'leastUnflushedLogId', which specifies the least entrylogid which is not yet
rotated and closed and GC considers all the entrylogs with logid lesser than
'leastUnflushedLogId' (entrylogids are sequential numbers) are eligible for
compaction/garbagecollection. In summary once the entrylog is rotated and
closed we need to maintain immutable semantics on entrylog file.
So instead we can provide relaxed constraint where an entrylog is
dedicated/committed to a ledger but not otherway around. So in most cases there
would be just one entrylog for a ledger in a bookie, but in situations like
when entrylog reaches capacity it is rotated and new one is created for that
ledger, when ledgerdir is full all the entrylogs in that ledgerdir are rotated
and new ones are created for those ledgers, because of segmentation and
replication various segments of ledger might end up in different entrylogs and
because of a bookie crash while replaying the journal new entrylog will be
created for the leftout entries in the journal., entries of a ledger might end
up in different entrylogs in a bookie. To summarize briefly about this approach.
- is to have server configuration specifying entrylogperledger is enabled
- for the previous behavior (one active entrylog) that config can be set to
- when entrylogger receives addEntry call, it needs to know the entry log
for the current ledger
- so EntryLogger needs to maintain state information of mapping of ledgerId
to entrylogid. If the in-memory map doesn't contain entry for the ledger, then
EntryLogger will create a new Entrylog and add the mapping of ledgerId to
- for creation of new entrylog, EntryLogger will pick writable ledgerdir
with least number of active entrylogs
- if entrylog reaches the capacity, then it will be rotated and new entrylog
will be assigned to that ledger and mapping will be updated
- If a ledgerdir becomes full, then all the entrylogs in that ledgerdir,
should be rotated. New EntryLogs should be created in the available writable
ledgerdirs for those ledgers and the mapping should be updated.
- when ledgerdir becomes writable again that ledgerdir should become
eligible for creation of new entrylogs
- Currently Bookie is not informed about the writeclose of the ledger, so
there needs to be a way to know when to remove the mapping entry from the map
and rotate the entrylog. One simple way to handle it is to use cache (Guava
Cache library) with timebased eviction policy (on last access) and as part of
removal listener we can rotate the corresponding entrylog.
- Time based eviction policy is simple to provide, but untill entrylog file
is rotated and flushed, filehandles of entrylogs are kept open and it wont
become eligible for compaction/garbage collection. So explicit writeclose call
from client to bookies ensemble of that ledger is needed for better handling of
- Both the time based eviction and removal policy and explicit writeclose
call are required because not in all cases explicit write close calls to
bookies are guaranteed, like during ensemble change of ledger, client crash and
unreliable write close protocol. Advisory Write Close is explained below in
- For this feature I need to make changes to checkpoint logic. Currently
with BOOKKEEPER-564 change, we are scheduling checkpoint only when current
entrylog file is rotated. So we dont call 'flushCurrentLog' when we checkpoint.
But for this feature, since there are going to be multiple active entrylogs,
scheduling checkpoint when entrylog file is rotated, is not an option. So I
need to call flushCurrentLogs when checkpoint is made for every 'flushinterval'
- With entrylogperledger feature we are not changing format of the entrylog
contents in anyway, so it should be possible to switch entrylogperledger
configuration back and forth.
**Advisory Write Close implementation details**
- Advisory write close message should be sent to all the bookies of the
current ensemble when ledger is write closed or recover opened.
- Client operation will not wait for the callback of its call.
- the callback of the advisory close operation is going to be just logger.
It just logs message if it is success or log error in case of any error.
- bookie should communicate the ledger close message to entrylogger and it
should store that message in memory datastructure.
- when the next checkpoint happens after flushing all the entries of the
ledger to the corresponding entrylog, then it should use the close signal to
rotate the corresponding entrylog.
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
Apache Git Services