Vanlightly opened a new pull request #2936:
URL: https://github.com/apache/bookkeeper/pull/2936


   ### Motivation
   
   To be able to run without the journal without compromising the replication 
protocol.
   
   See #2705
   
   ### Changes
   
   Introduction of a data integrity check that prevents data loss due to
   running without the journal and auto-repairs the data.
   
   * Data integrity check
   
   The integrity check is comprised two parts: a preboot check that
   is triggered by either an unclean shutdown or an invalid cookie.
   The preboot check marks any open ledgers as both fenced and in
   limbo to prevent ledgers affected by potential data loss from
   being written to. The limbo status prevents NoSuchEntry and
   NoSuchLedger responses from being sent which avoid ledger
   truncation from any ledger recovery operations. Finally it sets
   a storage flag that a full check is required.
   
   A new data integrity check service has been to run the full
   integrity check once the bookie is running. If the service
   sees that the full check storage flag is set then it runs
   a full check. This involves scanning the index and comparing
   it against metadata to discover missing entries. Any
   missing entries are sourced from peer bookies by the
   EntryCopier and written to ledger storage.
   
   The data integrity check also has a different cookie validation
   implementation.
   
   The following configurations have been added to the conf
   file:
   - dataIntegrityCheckingEnabled=true/false. False by default.
     This config enables or disables data integrity checking.
     When set to false the legacy cookie validation is used.
   - dataIntegrityStampMissingCookies=true/false. False by
     default. This config allows the data integrity process
     to stamp new cookies if a cookie is missing from a
     directory. The full check will repair any lost data if
     the directory data was lost.
   
   * Cookie verification for data integrity checking
   
   The algorithm differs from that of LegacyCookieValidation in the
   following ways:
   
   - A empty directory isn't considered a fatal condition. It just means
     that the preboot phase of the data integrity checker must run. Once
     the preboot phase runs, it should be safe to stamp the cookies
     again.
   - Bookies are not allowed to change their identity. If they do, manual
     operator intervention is required (which is ok as it is expected
     that an operator would have to intervene to change the identity in
     the first place).
   - A missing cookie in zookeeper is only valid, if there are no cookies
     in any of the directories, as this is considered a new
     boot. Otherwise, manual operator intervention is required.
   
   * Async iterator for ledger metadata
   
   Common code for iterating over ledger metadata. There is already
   asyncProcessLedgers in LedgerManager, but that only gives the ledgerId
   and the API is nasty (it even uses ZK specific callbacks).
   
   This change adds a more modern iterator, which takes a function which
   returns a CompletableFuture. The iterator has rudimentary rate
   limiting, by limiting the number of ledgers which can be processed at
   a time. We should add something more advanced later, which takes into
   account the response time from ZK.
   
   * Add limbo state to bookie ledger representation
   
   Limbo state for a ledger means that we don't know whether we should
   have an entry for the ledger or not, which can happen when a bookie is
   started after having its disk wiped. We cannot response with a
   NoSuchEntryException or NoSuchLedgerException as this tell the client
   that we never had the requested entry, which may or may not be true,
   but if we tell it to the client, the client will act like it's true
   and possible mark the end of the ledger at an incorrect point.
   
   This change also adds locking to LedgerMetadataIndex. Previously it
   relied on the good graces to the calling code to avoid modifying the
   same ledger concurrently. Now that we are also using the index to
   store limbo state, we can't be so blasé.
   
   * Add entryExists call to ledgerStorage
   
   Currently the only way to check if an entry exists in the storage is
   to try to read the entry. However, this means pulling data out of the
   entrylog, which it should be sufficient to check that the entry exists
   in the index.
   
   This change adds the entryExists call to ledgerStorage. This has only
   been implemented for DbLedgerStorage. The implementation for the
   others should be trivial, but it needs to be tested.
   
   * Pregenerate the writeset from ledger metadata
   
   The bookkeeper client uses DistributionSchedule (of which
   RoundRobinDistributionSchedule is the only impl) to decide which
   members of the ensemble it writes an entry to. This writeset is
   generated for each entry. However, there is only |ensemble| possible
   writesets, so we should pregenerate them for the ledger and stop
   trashing memory.
   
   WriteSets represent a set of pregenerated writesets as would be
   otherwise generated from the distribution schedule. The constructor
   takes a list of indices (which should be generated based on the list
   of bookies in the ensemble), which specifies the preferred order that
   bookies should be tried for reads.
   
   * Storage state flags for LedgerStorage
   
   If a bookie crashes in the middle of a full integrity check, it needs
   to know to start it again when it reboots. For this, we need to
   persist some flag to persistent storage.
   
   This change adds persistent flags to the ledger storage
   interface. Multiple flags can be added in future, for example to mark
   the storage as dirty on boot, so we can detect non-clean shutdown.
   
   Flags are only implemented for DbLedgerStorage. The flags are stored
   in the metadata index, with a negative ledger id as key. The key of
   the storage selected for ledger 0 is used. This does mean flags will
   be lost if there is a change in the storage disk configuration, but
   data integrity checks will run in this case regardless.
   
   * EUNKNOWN code
   
   A new response code has been added to communicate that the state
   of an entry is unknown due the ledger being in limbo.
   
   * Added bookie unclean shutdown detection
   
   Adds unclean shutdown detection. When running with journal
   writes disabled and data integrity checking enabled, if
   the prior shutdown was unclean (not a graceful shutdown)
   then the data integrity checks are triggered. These checks
   avoid additional data loss scenarios and repair any lost
   data caused by the loss of unflushed data at the time
   of the unclean shutdown.
   
   The BookieServer registers start-up and shutdown with the
   UncleanShutdownDetection class. This class adds a dirty
   file to each ledger dir on registering start-up and clears
   all these files on registering shutdown. The presence of
   any of these files on boot-up indicates the prior shutdown
   was unclean.
   
   Master Issue: #2705
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to