ivankelly opened a new issue #570: Multiple active entrylogs
URL: https://github.com/apache/bookkeeper/issues/570
   JIRA: https://issues.apache.org/jira/browse/BOOKKEEPER-1041
   Reporter: Venkateswararao Jujjuri (JV) @jvrao
   Current bookkeeper is tuned for rotational HDDs. It has one active entrylog, 
and all the ledger/entries go to the same entrylog until it is rotated out. 
This is perfect for HDDs as seeks and moving head allover the disk platter is 
very expensive. But this is very inefficient for SSDs, as each SSD can handle 
multiple parallel writers, also this method is extremely inefficient for 
compaction as it causes write amplification and inefficient disk space usage.
   Our proposal is to have multiple active entrylogs and a configuration param 
on how many parallel entrylogs the system can have. This way one can have 
ability to configure to have less (may be  one) ledger per entrylog.
   ### Comments from JIRA
   *Enrico Olivelli* 2017-04-20T07:28:28.619+0000
    But this is very inefficient for HDDs
   did you mean SSD ? 
   *Venkateswararao Jujjuri (JV)* 2017-04-21T16:42:29.203+0000
   Yes I mean SSD. Corrected. Thanks.
   *Charan Reddy Guttapalem* 2017-05-18T15:01:22.566+0000
   In Bookie's EntryLogger, we are having only one current active entryLog and 
all the ledger/entries go to the same entrylog. This is perfect for HDDs as 
file syncs, seeks and moving head allover the disk platter is very expensive. 
But having single active Entry Log is  inefficient for SSDs, as each SSD can 
handle multiple parallel writers. Also, having single active EntryLog 
(irrespective of LedgerStorage type - interleaved/sorted), is inefficient for 
compaction, since entries of multiple ledgers will end up in an entrylog.
   Also in SortedLedgerStorage , in the addEntry request we flush 
EntryMemtable, if it reaches the sizelimit. Because of this we are observing 
unpredictable tail latency for addEntry request. When EntryMemTable snapshot of 
size (64 MB) is flushed all at once, this may affect the journal addentry 
latency. Also, if the rate of new add requests surpasses the rate at which the 
EntryMemTable's previous snapshot is flushed, then at a point the current 
EntryMemTable map will reach the limit and since the previous snapshot flush is 
in progress, EntryMemTable will throttle new addRequests, which would affect 
addEntry latency.
   The main purpose of this feature is to have efficient Garbagecollection 
story by minimizing the amount of compactions required and the ability to 
reclaim the deleted ledger's space quicker. Also with this feature we can lay 
foreground for switching to InterleavedLedgerStorage from SortedLedgerStorage 
and get predictable tail latency. 
   So proposal here is to have multiple active entrylogs. Which will help with 
compaction performance and make SortedLedgerStorage redundant.
   Design Overview:
   - is to have server configuration specifying number of active entry logs per 
   - for backward compatibility (for existing behaviour) that config can be set 
to 0. 
   - round-robin method will be used for choosing the active entry log for the 
current ledger in EntryLogger.addEntry method
   - if the total number of active entrylogs is more than or equal to number of 
active ledgers, then we get almost exclusivity
   - For implementing Round-Robin approach, we need to maintain state 
information of mapping of ledgerId to SlotId
   - there will be numberofledgerdirs*numberofactiveentrylogsperledgerdir 
slots. a slot is mapped to ledgerdir, but the activeentrylog of that slot will 
be rotated when it reaches the capacity.
   - By knowing the SlotId we can get the corresponding entryLogId associated 
to that slot.
   - If there is no entry for current ledger in the map, then we pick the next 
in order slot and add the mapping entry to the map.
   - Since Bookie won't  be informed about the writeclose of the ledger, there 
is no easy way to know when to remove the mapping entry from the map. 
Considering it is just <long ledgerid, int slotid> mapentry, we may compromise 
on evicting policy. We can just use some Cache, which has eviction policy, 
timebased on last access
   - If a ledgerdir becomes full, then all the slots having entrylogs in that 
ledgerdir, should become inactive. The existing mappings, mappings of active 
ledgers to these slots (active entrylogs), should be updated to available 
active slots.
   - when ledgerdir becomes writable again, then the slots which were inactive 
should be made active and become eligible for round-robin distribution
   - For this feature I need to make changes to checkpoint logic. Currently 
with BOOKKEEPER-564 change, we are scheduling checkpoint only when current 
entrylog file is rotated. So we dont call 'flushCurrentLog' when we checkpoint. 
But for this feature, since there are going to be multiple active entrylogs, 
scheduling checkpoint when entrylog file is rotated, is not an option. So I 
need to call flushCurrentLogs when checkpoint is made for every 'flushinterval' 
   *Enrico Olivelli* 2017-05-19T06:49:35.104+0000
   [~jujjuri] [~reddychara...@gmail.com]
   This sound very interesting. Now I can see clearly way JV wrote on the 
mailing list that maybe clients could send some hint to the bookies that a 
ledger has been gracefully deleted/closed
   *Charan Reddy Guttapalem* 2017-06-02T00:22:59.045+0000
   [~si...@apache.org] created writeup for this work item and discussed about 
it in last community call (May 18th)

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

Reply via email to