Ivan Bessonov created IGNITE-16102:
--------------------------------------
Summary: Store all RocksDB partitions in a single column family.
Key: IGNITE-16102
URL: https://issues.apache.org/jira/browse/IGNITE-16102
Project: Ignite
Issue Type: Improvement
Affects Versions: 3.0.0-alpha3
Reporter: Ivan Bessonov
Current storage implementation puts each partition in its own column family.
This effectively means that every partition lives in it's own database, sharing
only WAL and some in-memory resources. Given that each column family has
multiple files for LSM trees, the amount of opened file descriptors is bigger
than it needs to be.
Now, the idea is to have a single column family for partitions within a table.
And we should think of possibility of storing several tables in the same
RocksDB instance, for similar reasons. You can think about is as of cache
groups in Ignite 2.x.
There's also an "optimization" to be implemented that is missing in code -
using key hashes as prefixes.
h3. What should be implemented:
First of all, code will be heavily refactored. This will lead to
simplifications in many places.
Otherwise, I see the following list of goals to achieve:
* current implementation allows to derive the list of partitions from the list
of column families. This won't be possible, I suggest storing this list
explicitly in "meta" CF, in any format that'll be convenient during the
implementation
* there should be a way of having compact "tableId" representation. IgniteUUID
or even UUID is too much I think, but it might work as a basis. This problem
should be discussed
* binary representation for keys should now include following information:
** tableId - fixed-length set of bytes to be used as a prefix
** partitionId - 2 bytes that will follow the tableId. This layout will allow
making range queries for specific partitions of specific tables
** key hash - 4 bytes. This one is required to optimize comparison time for
keys. Generally speaking, it's safe to assume that hashes will be mostly
different for different keys, meaning that hashes will be enough to determine
keys inequality
** actual key payload goes after all these prefixes
--
This message was sent by Atlassian Jira
(v8.20.1#820001)