reddycharan commented on issue #570: Multiple active entrylogs URL: https://github.com/apache/bookkeeper/issues/570#issuecomment-368185454 **Design Overview** **Entrylog per Ledger** As name suggests, in this approach in a bookie we are going to have active entrylog dedicated to active ledger. But it is not a strict enforcement of one-to-one mapping of ledger to entrylog in a bookie. Strict one-to-one relationship between ledger and entrylog in a bookie is not possible for multiple reasons (ledgerdir might become full, entrylog might reach its capacity, segmentation/replication can happen, possibility of bookie crash,..). Besides, once entrylog is rotated it cannt be reopened. Since while rotating entrylog file, EntryLogger appends the ledger map at the end of the entry log and updates the entrylog file with the offset and size of the map. It is like sealing the entrylog file. And EntryLogger maintains a pointer called 'leastUnflushedLogId', which specifies the least entrylogid which is not yet rotated and closed and GC considers all the entrylogs with logid lesser than 'leastUnflushedLogId' (entrylogids are sequential numbers) are eligible for compaction/garbagecollection. In summary once the entrylog is rotated and closed we need to maintain immutable semantics on entrylog file. So instead we can provide relaxed constraint where an entrylog is dedicated/committed to a ledger but not otherway around. So in most cases there would be just one entrylog for a ledger in a bookie, but in situations like when entrylog reaches capacity it is rotated and new one is created for that ledger, when ledgerdir is full all the entrylogs in that ledgerdir are rotated and new ones are created for those ledgers, because of segmentation and replication various segments of ledger might end up in different entrylogs and because of a bookie crash while replaying the journal new entrylog will be created for the leftout entries in the journal., entries of a ledger might end up in different entrylogs in a bookie. To summarize briefly about this approach. - is to have server configuration specifying entrylogperledger is enabled - for the previous behavior (one active entrylog) that config can be set to false - when entrylogger receives addEntry call, it needs to know the entry log for the current ledger - so EntryLogger needs to maintain state information of mapping of ledgerId to entrylogid. If the in-memory map doesn't contain entry for the ledger, then EntryLogger will create a new Entrylog and add the mapping of ledgerId to EntryLog. - for creation of new entrylog, EntryLogger will pick writable ledgerdir with least number of active entrylogs - if entrylog reaches the capacity, then it will be rotated and new entrylog will be assigned to that ledger and mapping will be updated - If a ledgerdir becomes full, then all the entrylogs in that ledgerdir, should be rotated. New EntryLogs should be created in the available writable ledgerdirs for those ledgers and the mapping should be updated. - when ledgerdir becomes writable again that ledgerdir should become eligible for creation of new entrylogs - Currently Bookie is not informed about the writeclose of the ledger, so there needs to be a way to know when to remove the mapping entry from the map and rotate the entrylog. One simple way to handle it is to use cache (Guava Cache library) with timebased eviction policy (on last access) and as part of removal listener we can rotate the corresponding entrylog. - Time based eviction policy is simple to provide, but untill entrylog file is rotated and flushed, filehandles of entrylogs are kept open and it wont become eligible for compaction/garbage collection. So explicit writeclose call from client to bookies ensemble of that ledger is needed for better handling of entrylogs. - Both the time based eviction and removal policy and explicit writeclose call are required because not in all cases explicit write close calls to bookies are guaranteed, like during ensemble change of ledger, client crash and unreliable write close protocol. Advisory Write Close is explained below in detail. - For this feature I need to make changes to checkpoint logic. Currently with BOOKKEEPER-564 change, we are scheduling checkpoint only when current entrylog file is rotated. So we dont call 'flushCurrentLog' when we checkpoint. But for this feature, since there are going to be multiple active entrylogs, scheduling checkpoint when entrylog file is rotated, is not an option. So I need to call flushCurrentLogs when checkpoint is made for every 'flushinterval' period - With entrylogperledger feature we are not changing format of the entrylog contents in anyway, so it should be possible to switch entrylogperledger configuration back and forth. **Advisory Write Close implementation details** - Advisory write close message should be sent to all the bookies of the current ensemble when ledger is write closed or recover opened. - Client operation will not wait for the callback of its call. - the callback of the advisory close operation is going to be just logger. It just logs message if it is success or log error in case of any error. - bookie should communicate the ledger close message to entrylogger and it should store that message in memory datastructure. - when the next checkpoint happens after flushing all the entries of the ledger to the corresponding entrylog, then it should use the close signal to rotate the corresponding entrylog.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
