[
https://issues.apache.org/jira/browse/HDDS-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDDS-4395:
---------------------------------
Labels: pull-request-available (was: )
> Ozone Data Generator for Fast Scale Test
> ----------------------------------------
>
> Key: HDDS-4395
> URL: https://issues.apache.org/jira/browse/HDDS-4395
> Project: Hadoop Distributed Data Store
> Issue Type: New Feature
> Components: Tools
> Affects Versions: 1.0.0
> Reporter: Wei-Chiu Chuang
> Assignee: Wei-Chiu Chuang
> Priority: Major
> Labels: pull-request-available
> Attachments: Ozone Data Generator for Fast Scale Test.pdf
>
>
> I've been working on this fun project and would like to share with the
> community.
>
> h1. Synopsis
> We want to prove Ozone runs well at scale, in terms of number of keys
> (billions of keys), as well as dense DataNodes where each DN has hundreds of
> TB or even PB-scale capacity.
> h1. Challenge: Data generation
> The challenge is to generate a huge data set fast so that we can benchmark
> the system quickly. No existing tool is capable at this scale.
>
> h1. Proposal:
> The major bottleneck is OM’s key insertion performance. In addition, Ozone
> uses a single pipeline to write data, unless multi-raft is enabled.
>
> Instead of using Ozone's client API to generate data, We should write
> directly to OM, SCM and DN’s rocksdb. RocksDB can support u[p to a million
> key|https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks] bulk
> load operations.
>
> Similarly, we can skip the normal Ozone client write path; populate the
> container db and block files directly.
>
> (more details in the design doc)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]