[I] Proposal to Add Interface to Create and Retrieve CompositeKeys [incubator-resilientdb]

via GitHub Thu, 31 Jul 2025 12:51:00 -0700


harish876 opened a new issue, #180:
URL: https://github.com/apache/incubator-resilientdb/issues/180

# Background

Currently, secondary indexing is offloaded to an external datastore by
exporting all chain data to an external source. This introduces several moving
parts, including a utility like [Python
Cache](https://github.com/apache/incubator-resilientdb-resilient-python-cache).

Although the chain is
[in-memory](https://github.com/apache/incubator-resilientdb/blob/master/chain/state/chain_state.h#L38),
every client issues a
[GetAllBlocks](https://github.com/apache/incubator-resilientdb-graphql/blob/main/service/http_server/crow_service.cpp#L394)
request. This retrieves the entire blockchain state, pulling *all* the data.
While modifying this structure could improve overall blockchain state
retrieval, that is not our current focus.

# Goal

Instead, our goal is to improve read latencies for queries based on
secondary attribute lookups from the [storage
layer](https://github.com/apache/incubator-resilientdb/blob/master/chain/storage/storage.h),
which is the most common use case for applications. These would be for
applications using the storage engine as a document store, i.e, storing values
as a JSON object or otherwise as a simple key-value store. A few applications
that do this, that are part of this release, are:

1. [ResLens](https://github.com/apache/incubator-resilientdb-ResLens) - Uses
ResDB as a simple KV store
2. [ResCanvas](https://github.com/ResilientApp/ResCanvas) - Uses ResDB as a
document store
3. [Consensus](https://github.com/ResilientApp/Coinsensus-Backend) - Used
ResDB as a document store

# Problem Statement

Currently, if an application has to perform a lookup based on a secondary
attribute, then the following steps need to be done.
- Hit this endpoint
[GetAllValues](https://github.com/apache/incubator-resilientdb-graphql/blob/main/service/http_server/crow_service.cpp#L67)
- Then manually apply filtering logic in memory at the application layer.
- The alternative solution is to export all the data to an external
datastore and sync it constantly, and apply filtering logic on it.

# Proposed Solution

Composite Keys are a great way to add indexing support on non-primary
attributes. They allow for indexing by a single field, multiple fields, and
converging indexes for different workloads. They are used widely for this use
case, for example, MySQL's
[MyRocks](https://github.com/facebook/mysql-5.6/wiki/MyRocks-record-format#secondary-index-c)
storage engine.

They are also used by
[Hyperledger](https://hyperledger-fabric.readthedocs.io/en/release-2.5/)
blockchain as a lightweight indexing mechanism atop a Key-Value store
[Composite keys in
Hyperledger](https://pkg.go.dev/github.com/hyperledger/fabric/core/chaincode/shim#ChaincodeStub.CreateCompositeKey).

We can leverage LevelDB or [RocksDB's
BlobDB](https://github.com/facebook/rocksdb/wiki/BlobDB), the latter of which
reduces write amplification on large value pairs, which is common with
applications that use ResilientDB as a document store.

# Technical Details
- PR to be attached to the header files.
- General Idea expressed below.

1. API layer adds 2 new endpoints
- CreateCompositeKey
- GetByCompositeKey

2. Storage Engine adds these 2 function calls as part of its interface and
implements them.
3. Proto files need to be changed to add these 2 calls so that the API can
talk to the ResilientDB Process.

# Advantages of our Solution
- **Improved Read Latency**: Enables faster lookups by indexing secondary
attributes directly in the storage layer.

- **Reduces Application Complexity**: Eliminates the need for client-side
filtering logic or full data scans.

- **No External Sync Required**: Removes dependency on external databases
and continuous export/sync pipelines.

- **Lightweight Implementation**: Composite keys require minimal overhead
and can be implemented without significant architectural changes.

- **Supports Richer Queries**: Enables filtering and retrieval by multiple
fields or field combinations (converged indexes).

- **Built on Proven Techniques**: Uses battle-tested patterns from systems
like MyRocks (MySQL) and Hyperledger Fabric.

- **Document Store Friendly**: Optimized for use cases where values are
stored as JSON or large blobs, especially with RocksDB’s BlobDB.

- **Scalable Design**: Can handle high write and read throughput while
preserving query efficiency.

- **Expands Use Cases**: Unlocks new classes of applications like
dashboards, real-time analytics, and search-backed services.

- **Easy Adoption**: Current Applications do not need to change their code.
All these features are add-ons and can be added without deleting, modifying, or
migrating data.

# Disadvantages of our Solution
- **Increased Write Amplification**: Any external index needs extra space;
this is the classic space vs time conundrum in CS. We can limit applications to
the number of composite keys they can create. Creating the actual keys takes up
very little space. These composite keys are just heap pointers and are by
default non-clustered indexes.

- **Increased App Complexity**: We do not automatically create composite
keys; rather ask the developer to build it. This acts just like inserting yet
another key-value pair. Now the write becomes non-atomic, but this can be
handled by the application, where first the process of inserting the data and
creating the index is wrapped in a transaction.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Proposal to Add Interface to Create and Retrieve CompositeKeys [incubator-resilientdb]

Reply via email to