[bookkeeper] branch master updated: BP-35: 128 bits support

sijie Thu, 06 Sep 2018 22:02:17 -0700

This is an automated email from the ASF dual-hosted git repository.

sijie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/bookkeeper.git



The following commit(s) were added to refs/heads/master by this push:
     new 992db73  BP-35: 128 bits support
992db73 is described below

commit 992db732f42e5d4425ec2f986fb42e29515a182b
Author: Sijie Guo <[email protected]>
AuthorDate: Thu Sep 6 22:02:04 2018 -0700

    BP-35: 128 bits support
    
    Descriptions of the changes in this PR:
    
    ### Motivation
    
    BookKeeper coordinates with a metadata store to generate a cluster wide 
`ledgerId`.
    Currently this is a signed `64 bit` number (effectively 63 bits). This 
method works
    great because we have a centralized metadata store for coordinating the id 
generation.
    However this method may not scale as the cluster size and number of ledgers 
grow.
    
    [Universally unique identifier - 
Wikipedia](https://en.wikipedia.org/wiki/Globally_unique_identifier)
    is a preferred way to generate decentralized globally unique IDs and it 
takes `128 bits`.
    This method can scale well as it doesn't need a centralized coordination.
    
    This BP proposes the changes for increasing ledger id from `63 bits` to 
`128 bits`.
    
    Master Issue: #603
    
    
    
    Author: Sijie Guo <[email protected]>
    
    Reviewers: Andrey Yegorov <None>, Enrico Olivelli <[email protected]>, 
Jia Zhai <None>, Venkateswararao Jujjuri (JV) <None>
    
    This closes #1611 from sijie/bp_35_128bits
---
 site/bps/BP-35-128-bits-support.md     | 399 +++++++++++++++++++++++++++++++++
 site/community/bookkeeper_proposals.md |   3 +-
 2 files changed, 401 insertions(+), 1 deletion(-)

diff --git a/site/bps/BP-35-128-bits-support.md 
b/site/bps/BP-35-128-bits-support.md
new file mode 100644
index 0000000..cae6b8c
--- /dev/null
+++ b/site/bps/BP-35-128-bits-support.md
@@ -0,0 +1,399 @@
+---
+title: "BP-34: 128 bits support"
+issue: https://github.com/apache/bookkeeper/603
+state: "Under Discussion"
+design doc: 
https://docs.google.com/document/d/1cu54dNSV2ZrdWCi40LcyX8NxXGRCW0609T_ewmK9BWM
+release: "4.9.0"
+---
+
+### Motivation
+
+BookKeeper coordinates with a metadata store to generate a cluster wide 
`ledgerId`.
+Currently this is a signed `64 bit` number (effectively 63 bits). This method 
works
+great because we have a centralized metadata store for coordinating the id 
generation.
+However this method may not scale as the cluster size and number of ledgers 
grow.
+
+[Universally unique identifier - 
Wikipedia](https://en.wikipedia.org/wiki/Globally_unique_identifier)
+is a preferred way to generate decentralized globally unique IDs and it takes 
`128 bits`.
+This method can scale well as it doesn't need a centralized coordination. 
+
+This BP proposes the changes for increasing ledger id from `63 bits` to `128 
bits`.
+
+### 128 bits
+
+Since there is no native support for `128 bits` in both Java and
+[Protobuf](https://github.com/google/protobuf/issues/2180), we have to break 
`128 bits`
+into 2 `64 bits` numbers for representing the `128 bits` id:
+
+- ledger-id-msb: the most significant 64 bits, bit 64 - 127
+- ledger-id-lsb: the least significant 64 bits, bit 0 - 63
+
+For backward compatibility, the `ledger-id-lsb` is the current `64 bits` 
ledger-id.
+The `ledger-id-msb` will be added as a new field in both API and protocol. 
+
+I am proposing calling `ledger-id-msb` as `ledger-scope-id`. So the current 
64bits `ledgerId` and
+the newly introduced 64bits `ledgerScopeId` together will be forming the new 
`128 bits` ledger id.
+
+The default `ledgerScopeId` is `0`. That means any ledgers created prior to 
this change are allocated
+under scope `0`. Hence it maintains backward compatibility during upgrade. 
+
+The combination of `ledgerScopeId` and `ledgerId` forms the `128 bits` ledger 
id. We can introduce a
+hex representation of this `128 bits` ledger id - `ledgerQualifiedName` . This 
`ledgerQualifiedName` can
+be useful for CLI tooling, REST api and troubleshooting purpose. The API 
internally can convert
+`ledgerQualifiedName` to `ledgerScopeId` and `ledgerId`.
+
+### Public Interfaces
+
+#### API Change
+
+The API will be introducing `ledgerScopeId` across the interfaces. This field 
will be optional and default to `0`. 
+
+##### Handle
+
+Introduce a new method `getScopeId` for representing the scope id (the most 
significant  `128 bits` ledger id).
+
+```java
+public interface Handle extends AutoCloseable {
+
+  ...
+
+  /**
+   * Return the ledger scope id. The most significant 64 bits of 128 bits.
+   */
+  long getScopeId();
+
+  /**
+   * Return the ledger id. The least significant 64 bits of 128 bits.
+   */ 
+  long getId();
+
+  ...
+
+}
+```
+
+##### Create LedgerAdv
+
+Introduce a new method `withLedgerScopeId` in `CreateAdvBuilder` for providing 
`scopeId`
+(the most significant 64 bits for 128 bits ledger id) on creating a ledger.
+
+```java
+public interface CreateAdvBuilder extends OpBuilder<WriteHandle> {
+  ...
+
+  /**
+   * Set the scope id for the newly created ledger.
+   * If no explicit scopeId is passed, the new ledger
+   * will be created under scope `0`.
+   */
+  CreateAdvBuilder withLedgerScopeId(long scopeId);    
+
+  ...
+}
+```
+
+##### Open Ledger
+
+Introduce a new method `withLedgerScopeId` in `OpenBuilder` for providing 
`scopeId`
+(the most significant 64 bits for 128 bits ledger id) on opening a ledger.
+
+```java
+public interface OpenBuilder extends OpBuilder<ReadHandle> {
+  ...
+  /**
+   * Set the scope id of the ledger to open.
+   */
+  OpenBuilder withLedgerScopeId(long scopeId);
+  ...
+}
+```
+
+##### Delete Ledger
+
+Introduce a new method `withLedgerScopeId` in `DeleteBuilder` for providing 
`scopeId`
+(the most significant 64 bits for 128 bits ledger id) on deleting a ledger.
+
+```java
+public interface DeleteBuilder extends OpBuilder<Void> {
+  ...
+  /**
+   * Set the scope id of the ledger to delete.
+   */
+  DeleteBuilder withLedgerScopeId(long scopeId);
+  ...
+}
+```
+
+#### CLI
+
+All BookKeeper CLI tools will be updated with additional option 
`—ledger-scope-id`.
+Optionally we can add option `—ledger-qualified-name` (the hex representation 
of 128 bits).
+Internally all the CLI tools will convert ledger qualified name to `ledgerId` 
and `ledgerScopeId`.
+
+#### REST
+
+1. All ledger related endpoints will be adding a new parameter 
`ledger_scope_id`. 
+2. `ListLedgerService`  only supports listing ledgers under a given ledger 
scope id.
+   If `ledger_scope_id` is missing, it will be listing ledgers under scope `0`.
+
+#### Wire Protocol
+
+> There will be no plan for supporting 128 bits in v2 protocol, due to the 
limitation in v2 protocol.
+> So any operations in v2 protocol with scope id not equal to 0  will be 
failed immediately with
+> `ILLEGAL_OP` exceptions.
+
+All the request and response messages will be adding an optional field 
`optional int64 ledgerScopeId`.
+
+#### Entry Format
+
+Currently all the entries written to bookies are encoded in a certain format, 
including `metadata`,
+`digest code` and `payload`. The entry format is not *versioned*.
+
+In order to support adding another field `ledgerScopeId` in the `metadata` 
section, we are introducing
+`version` in the entry format.
+
+##### Entry Format V1
+
+```json
+Entry Format V1
+===============
+--- header ---
+Bytes (0 - 7)                   : Ledger ID
+Bytes (8 - 15)                  : Entry ID
+Bytes (16 - 23)                 : LastAddConfirmed
+Bytes (24 - 31)                 : Length
+--- digest ---
+Bytes (32 - (32 + x - 1))       : Digest Code (e.g. CRC32)
+--- payload ---
+Bytes ((32 + x) - )             : Payload
+```
+
+> `x` is the length of digest code.
+
+>  Prior to introducing `ledgerScopeId`, ledgerId is assumed to be a positive 
value.
+
+##### Entry Format V2
+
+```json
+Entry Format V2
+===============
+--- header ---
+Bytes (0 - 7)                   : Metadata Flags
+Bytes (8 - 15)                  : Ledger Scope ID
+Bytes (16 - 23)                 : Ledger ID
+Bytes (24 - 31)                 : Entry ID
+Bytes (32 - 39)                 : LastAddConfirmed
+Bytes (40 - 47)                 : Length
+--- digest ---
+Bytes (37 - (37 + x - 1))       : Digest Code (e.g. CRC32)
+--- payload ---
+Bytes ((37 + x) - )             : Payload
+``` 
+
+> `x` is the length of digest code.
+
+###### Metadata Flags
+
+```json
+Metadata: 1 Bytes (Long)
+------------------------
+0x 0 0
+   |__| 
+     |
+ version
+
+----
+Bit 0 - 3: digest type (e.g. CRC32, CRC32C and such)
+Bit 4 - 7: version, the most significant bit of this byte will be always set 
to 1.
+it will be used for differentiating entry format v1 and v2.
+
+```
+
+We are setting the most significant bit to be `1`. So the first byte in entry 
v2 will
+be a negative value, which can be used for differentiating entry format v1 and 
v2.
+The version will be encoded into the first byte. The version will be used for 
describing
+the entry format.
+
+##### Decoding Entry
+
+The pseudo code for decoding an entry will be described as followings:
+
+```java
+
+ByteBuf entry = ...;
+
+int metadataFlags = entry.getByte();
+
+if (metadataFlags <= 128) { // the entry is encoded in v1 format
+       // decoding the entry in v1 format
+       ...
+} else {
+       // decoding the entry in v2 format
+}
+
+```
+
+#### Bookie Storage
+
+##### Journal
+
+A new method should be added in journal `WriteCallback` to handle 
`ledgerScopeId`.
+
+```java
+public interface WriteCallback {
+
+    void writeComplete(int rc,
+                       long ledgerScopeId,
+                       long ledgerId,
+                       long entryId,
+                       BookieSocketAddress addr,
+                       Object ctx);
+
+    default void writeComplete(int rc,
+                               long ledgerId,
+                               long entryId,
+                               BookieSocketAddress addr,
+                               Object ctx) {
+        writeComplete(rc, 0L, ledgerId, entryId, addr, ctx);
+    }
+
+}
+```
+
+The journal should be changed to be able to retrieve `ledgerScopeId` from the 
entry
+payload based on [Entry Format](#entry-format).
+
+##### Ledger Storage
+
+###### EntryLogger
+
+1. Methods in `EntryLogger` should be able to accept `ledgerScopeId` as a 
parameter.
+2. EntryLogger should be updated to retrieve `ledgerScopeId` from the entry 
payload
+   based on [Entry Format](#entry-format).
+
+###### EntryMemTable
+
+`ledgerScopeId` should be added as part of `EntryKey`.
+
+###### IndexPersistenceMgr
+
+Currently the ledger index files (64 bits) are stored into 2-level-hirechicy
+directories - `<msb-32bits-hex>/<lsb-32bits-hex>/<ledger-id-hex>.idx`.
+
+If `ledgerScopeId` is 0, it will be using existing scheme for storing and 
retrieving
+ledger index files.
+
+If `ledgerScopeId` is not 0, that means the ledgers are produced by new 
clients that
+support 128-bits, those ledgers will be stored in a 4-level-hirechicy
+directories -
+`<msb-32bits-hex-ledger-scope-id>/<lsb-32bits-hex-ledger-scope-id>/<msb-32bits-hex-ledger-id>/<lsb-32bits-hex-ledger-id>`.
+
+All the file info caches should be updated to use `<ledgerScopeId, ledgerId>`
+as index keys.
+
+###### IndexInMemPageMgr
+
+The LRU pages map will be updated to use `<ledgerScopeId, ledgerId>` as index
+keys.
+
+###### DBLedgerStorage
+
+Currently DBLedgerStorage use `<ledgerId, entryId>` as the index key for 
indexing entry
+locations for each entry.
+
+Similar as `SortedLedgerStorage` and `InterleavedLedgerStorage`, for ledgers 
whose
+`ledgerScopeId` is 0, they will be using existing scheme for storing their 
entry locations.
+
+For ledgers whose `ledgerScopeId` is not 0, they will be stored in a new 
rocksdb,
+whose index key will be `<ledgerScopeId, ledgerId, entryId>`.
+
+#### Metadata Store
+
+##### LedgerManager
+
+All the interfaces should be updated with accepting `ledgerScopeId`.
+
+The actual implementation should decide how to store metadata
+for `<ledgerScopeId, ledgerId>`. 
+
+###### ZooKeeper Ledger Manager
+
+We need to introduce a LongLongHierchicalLedgerManager for storing metadata
+indexing by `<ledgerScopeId, ledgerId>`.
+
+If `ledgerScopeId` is 0, then it will be falling back to 
`LongHierachicalLedgerManager`.
+So no behavior is changed.
+
+If `ledgerScopeId` is not 0, those ledgers will be indexed in new hierarchy
+(possible under a different znode).
+
+###### Ledger ID generation
+
+When upgrading from 64bit to 128bits, we probably don't need any centralized 
mechanism
+for generating ledger id. It can be implemented using UUID generation.
+
+Especially since we are supporting 128bits by introducing `ledgerScopeId`. 
That means
+application of bookkeeper can decide its own way for generating their 
`scopeId`.
+An application or even bookkeeper client can generate its ledgerId using UUID 
generation,
+then breaks the 128 bits UUID into two parts, one serves as `ledgerScopeId` 
and the other
+one serves as `ledgerId`.
+
+###### Etcd
+
+Since Etcd has a better key/value presentation, we can basically just combine
+`<ledgerScopeId, ledgerId>` as the index key for storing ledger metadata in 
Etcd.
+Nothing is needed for special consideration.
+
+### Performance Concerns
+
+There shouldn't be any performance difference when not using 128 bit ledger id
+(`ledgerScopeId` is omitted).
+
+Performance concerns can be arised in following areas:
+
+- **Wire Protocol**: additional 9 bytes will be added per entry, one byte for 
version
+  and 8 bytes for the msb of 128 bit ledger id
+- **Journal**: additional 9 bytes will be added per entry (same as wire 
protocol). 
+- **EntryLogger**: additional 9 bytes will be added per entry (same as wire 
protocol)
+- **Memtable**: additional 8 bytes will be added per indexed entry.
+- **FileInfo**: there is no change to the index file format itself.
+- **IndexPersistenceManager**: Files will be organized in more directory 
hierarchy.
+  It shouldn't be a big deal. 
+- **IndexInMemoryManager (LedgerCache)**: additional 8 bytes per index page.
+- **DbLedgerStorage**: additional 8 bytes per entry for entry location.
+- **Metadata**: on zookeeper, we need a 128 bit ledger manager, that means 
more znode
+  hierarchy than 64 bit ledger manager. Etcd like key/value metadata store is 
probably
+  more preferrable for 128 bit ledger manager.
+
+However increasing ledger id from 64 bits to 128 bits can get rid of the only 
remaining
+central point, since we don't need to use zookeeper for ledger id generation. 
The id
+generation can become decentralized. 
+
+### Proposed Changes
+
+All the required changes are described above. In summary, the changes can
+happen in following 2 phases:
+
+1. Ensure all components have `ledgerScopeId` added (both wire protocol, 
storage and such).
+   Assuming `ledgerScopeId` will be 0. The changes can happen independently 
and ensure
+   they are backward compatible with old clients.
+2. Add `ledgerScopeId` into public API, so application can start using 
`ledgerScopeId`.
+   After that, applications can use UUID to generate ledger id and break UUID 
into two parts,
+   one is `ledgerScopeId`, while the other one is `ledgerId`.
+
+### Compatibility, Deprecation, and Migration Plan
+
+All the changes are backward compatible, since we are doing the changes by 
adding an optional
+field `ledgerScopeId`. Old clients can still operating in the mode of 
`ledgerScopeId == 0`.
+The new application can activate the feature by starting using `ledgerScopeId` 
in the new API.
+
+### Test Plan
+
+1. Add unit tests for individual components on introducing `ledgerScopeId`.
+2. Add backward compatibility tests for individual components.
+3. Add end-to-end integration tests for introducing `ledgerScopeId`.
+4. Add end-to-end backward compatibility tests.
+
+### Rejected Alternatives
+
+N/A
diff --git a/site/community/bookkeeper_proposals.md 
b/site/community/bookkeeper_proposals.md
index ee8899e..94bd6a5 100644
--- a/site/community/bookkeeper_proposals.md
+++ b/site/community/bookkeeper_proposals.md
@@ -85,7 +85,7 @@ using Google Doc.
 
 This section lists all the _bookkeeper proposals_ made to BookKeeper.
 
-*Next Proposal Number: 35*
+*Next Proposal Number: 36*
 
 ### Inprogress
 
@@ -106,6 +106,7 @@ Proposal | State
 [BP-32: Advisory (optimistic) write 
close](../../bps/BP-32-advisory-write-close) | Accepted
 [BP-33: Move releasing docker images out of main 
repo](../../bps/BP-33-building-official-docker-imags) | Draft
 [BP-34: Cluster Metadata Checker](../../bps/BP-34-cluster-metadata-checker) | 
Accepted
+[BP-35: 128 bits support](../../bps/BP-35-128-bits-support) | Under Discussion
 
 ### Adopted

[bookkeeper] branch master updated: BP-35: 128 bits support

Reply via email to