[jira] [Commented] (IGNITE-14747) RocksDB research: configuration, lifecycle, basic integration

Ivan Bessonov (Jira) Thu, 03 Jun 2021 08:01:36 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356500#comment-17356500
 ]


Ivan Bessonov commented on IGNITE-14747:
----------------------------------------

Some research results:
 * RocksDB is pretty easy to integrate. It allows you, among other features, to 
store arbitrary data sorted, iterate through it and snapshot the state until 
next restart.
 * Every DB instance can have multiple "column families" - I view them as 
partitions and possibly "index.bin" candidates. There's a support for 
multi-column-family batch-writes, which is good for SQL indexes. There's also 
"dropColumnFamilies" to evict multiple partitions at once.

 ** Here we have a potential issue - evicted partitions will still be present 
in LSM tree until it's fully compacted. That'll take some time, meaning that we 
will store too much data sometimes on top of duplicated entries in LSM tree.
 * Every instance has its own WAL. We should consider disabling it, because it 
will be replaced with rebalancing from RAFT log.
 * For the first implementation we could create new RocksDB instance for every 
table.
 ** Cons: hard to configure memory consumption. As far as I know, we can't 
force several RockDB instances to use shared memory restrictions.
 ** Pros: better reads performance. Every cache tree is separate and hence much 
smaller, giving you less lookups in general.
 * Usage of the RocksDB for RAFT log. From what I understand, log is basically 
a cache "long -> value" with auto-incrementing key and extremely rare update 
operations, almost append-only. This approach may not be very optimal for very 
simple reason: layer files merging is effectively equal to concatenation, but 
there's no way to tell it to the engine. This will lead to excessive IO when we 
don't need it.
 * Lifecycle - not much to say here. We should start it before starting caches 
and stop after stopping caches. There should be explicit way to tell partition 
number to API or something, these details will be decided later.

> RocksDB research: configuration, lifecycle, basic integration
> -------------------------------------------------------------
>
>                 Key: IGNITE-14747
>                 URL: https://issues.apache.org/jira/browse/IGNITE-14747
>             Project: Ignite
>          Issue Type: New Feature
>            Reporter: Sergey Chugunov
>            Assignee: Ivan Bessonov
>            Priority: Major
>              Labels: iep-74, ignite-3
>             Fix For: 3.0
>
>
> In accordance with 
> [IEP-74|https://cwiki.apache.org/confluence/display/IGNITE/IEP-74+Data+Storage]
>  first implementation of persistent Storage will be based on RocksDB K-V 
> storage.
> Thus research is needed on how to integrate it into ignite-3 realm. The 
> following questions should be covered:
> # What additional configuration properties are needed.
> # How to reconcile lifecycle of RocksDB instance with Ignite node lifecycle.
> # How RocksDB abstractions (e.g. partitions) match with Ignite abstractions.
> Also scope of tasks to implement basic Storage API over RocksDB should be 
> defined.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (IGNITE-14747) RocksDB research: configuration, lifecycle, basic integration

Reply via email to