[ 
https://issues.apache.org/jira/browse/HDDS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-1094:
--------------------------------
    Description: 
Goal:
It can be useful to exercise the IO and control paths in Ozone for simulated 
large datasets without having huge disk capacity at hand. For example, this 
will allow us to get things like container reports and incremental container 
reports, while not needing huge cluster capacity. The 
[SimulatedFsDataset|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java]
 does something similar in HDFS. It has been an invaluable tool to simulate 
large data stores.

  was:
Goal:
Make Ozone chunk Read/Write operations CPU/network bound for specially 
constructed performance micro benchmarks.
Remove disk bandwidth and latency constraints - running ozone data path against 
extreme low-latency & high throughput storage will expose performance 
bottlenecks in the flow. But low-latency storage(NVME flash drives, Storage 
class memory etc) is expensive and availability is limited. Is there a 
workaround which achieves similar running conditions for the software without 
actually having the low latency storage? At least for specially constructed 
datasets -  for example zero-filled blocks (*not* zero-length blocks).

Required characteristics of the solution:
No changes in Ozone client, OM and SCM. Changes limited to Datanode, Minimal 
footprint in datanode code.

Possible High level Approach:
The ChunkManager and ChunkUtils can enable writeChunk for zero-filled chunks to 
be dropped without actually writing to the local filesystem. Similarly, if 
readChunk can construct a zero-filled buffer without reading from the local 
filesystem whenever it detects a zero-filled chunk. Specifics of how to detect 
and record a zero-filled chunk can be discussed on this jira. Also discuss how 
to control this behaviour and make it available only for internal testing.



> Performance test infrastructure : skip writing user data on Datanode
> --------------------------------------------------------------------
>
>                 Key: HDDS-1094
>                 URL: https://issues.apache.org/jira/browse/HDDS-1094
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: Ozone Datanode
>            Reporter: Supratim Deka
>            Assignee: Supratim Deka
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Goal:
> It can be useful to exercise the IO and control paths in Ozone for simulated 
> large datasets without having huge disk capacity at hand. For example, this 
> will allow us to get things like container reports and incremental container 
> reports, while not needing huge cluster capacity. The 
> [SimulatedFsDataset|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java]
>  does something similar in HDFS. It has been an invaluable tool to simulate 
> large data stores.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to