[ 
https://issues.apache.org/jira/browse/HDDS-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16796601#comment-16796601
 ] 

Siddharth Wagle edited comment on HDDS-1234 at 3/19/19 10:58 PM:
-----------------------------------------------------------------

The initial version of this patch could actually just create the reverse 
mapping in Rocks DB instead of (ContainerId, KeyPrefix) -> Count, the main 
reason is that this might need some optimization in order to not read and write 
to RocksDB for every key read from OM's Rocks DB. *Note*: The iteration would 
be Key ordered but the read/write is to a (ContainerId, KeyPrefix) ordered LSM 
tree which is effectively random. The read might still be from a memtable but 
we would be doing this for each key read from OM's RocksDB. To some extent, 
this would take away the benefit of condensing the keyspace to storing only the 
prefix.

_An alternate proposal (for a future Jira)_:

- Store the computed state in memory of Recon server and write to the RocksDB 
store at the end of the iteration.
- Estimate on required memory (assuming 200TB storage node): 40,000 containers 
* 1000 Datanodes / 3 replication factor = 13 million containers.
- It is definitely possible to store a Map with 8 byte ContainerID as the key
- The value could be a *prefix-tree* data structure for efficient lookups, a 
trie of the keyPrefix
- The Null node or Empty marker node of the Trie can store the count.
- Additionally, since the */Volume/Bucket* part of the keyPrefix would most 
likely be common or poorly distributed, we could make it a part of the key in 
the (Container, KeyPrefixPart) -> (KeyPrefixTrie)

These are just some initial thoughts, IMO v1 of Recon should store the flat 
reverse map in Rocks DB and do this as an optimization in v2. cc:[~anu]





was (Author: swagle):
The initial version of this patch could actually just create the reverse 
mapping in Rocks DB instead of (ContainerId, KeyPrefix) -> Count, the main 
reason is that this might need some optimization in order to not read and write 
to RocksDB for every key read from OM's Rocks DB. *Note*: The iteration would 
be Key ordered but the read/write is to a (ContainerId, KeyPrefix) ordered LSM 
tree which is effectively random. The read might still be from a memtable but 
we would be doing this for key read from OM's RocksDB. To some extent, this 
would take away the benefit of condensing the keyspace to storing only the 
prefix.

_An alternate proposal (for a future Jira)_:

- Store the computed state in memory of Recon server and write to the RocksDB 
store at the end of the iteration.
- Estimate on required memory (assuming 200TB storage node): 40,000 containers 
* 1000 Datanodes / 3 replication factor = 13 million containers.
- It is definitely possible to store a Map with 8 byte ContainerID as the key
- The value could be a *prefix-tree* data structure for efficient lookups, a 
trie of the keyPrefix
- The Null node or Empty marker node of the Trie can store the count.
- Additionally, since the */Volume/Bucket* part of the keyPrefix would most 
likely be common or poorly distributed, we could make it a part of the key in 
the (Container, KeyPrefixPart) -> (KeyPrefixTrie)

These are just some initial thoughts, IMO v1 of Recon should store the flat 
reverse map in Rocks DB and do this as an optimization in v2. cc:[~anu]




> Iterate the OM DB snapshot and populate the recon container DB. 
> ----------------------------------------------------------------
>
>                 Key: HDDS-1234
>                 URL: https://issues.apache.org/jira/browse/HDDS-1234
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: Ozone Recon
>            Reporter: Aravindan Vijayan
>            Assignee: Aravindan Vijayan
>            Priority: Major
>             Fix For: 0.5.0
>
>
> * OM DB snapshot contains the Key->ContainerId + BlockId information. 
> * Iterate the OM snapshot DB and create the reverse map of (ContainerId, Key 
> prefix) -> Key count to be stored in the Recon container DB.
> * Use a codec to store data into Recon container DB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to