[
https://issues.apache.org/jira/browse/HDDS-15586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aleksei Ieshin updated HDDS-15586:
----------------------------------
Status: Patch Available (was: In Progress)
> Add freon command to read a user-supplied list of existing keys
> -----------------------------------------------------------------
>
> Key: HDDS-15586
> URL: https://issues.apache.org/jira/browse/HDDS-15586
> Project: Apache Ozone
> Issue Type: Improvement
> Components: freon
> Reporter: Aleksei Ieshin
> Assignee: Aleksei Ieshin
> Priority: Major
> Labels: pull-request-available
>
> h2. Problem
>
>
> freon's client read generators can only read keys they themselves
> generated:
>
> * {{ockg}}/{{ockv}} use prefix+index naming, and {{ockv}} validates every
> read against key-0's digest (assumes all keys have identical content).
>
> * {{SameKeyReader}} ({{ocokr}}) reads one fixed key from many threads.
>
>
>
>
>
> There is no freon command that points at an arbitrary, heterogeneous set of
> existing keys (a real dataset already in a bucket) and measures read
> throughput. This is needed for read-path
> performance and capacity/scaling work, where freshly generated uniform keys
> are page-cache-hot and not representative of production data.
>
>
>
>
> h2. Proposed change
>
>
> Add a freon subcommand {{OzoneClientKeyListReader}} ({{ocklr}}) that:
>
>
> * takes {{--key-file <path>}} — a local file with one key name per line;
> blank lines and {{#}} comments ignored;
>
> * reuses {{BaseFreonGenerator}} — a warm shared {{OzoneClient}}, {{-t}}
> threads, {{-n}} total reads (task i reads keys[i % keys.size()], so {{-n}}
> loops the list), DropWizard timer;
> * per read calls {{bucket.readKey(key)}}, drains the stream into a fixed
> buffer and counts bytes (no content/digest assumptions); reports the
> {{key-read}} timer plus an aggregate bytes/wall-time
> MB/s line.
>
>
>
>
>
> It exercises the same end-to-end read path as {{ozone sh key get}} and the
> FileSystem {{open()}} ({{readKey}} -> {{KeyInputStream}} ->
> {{BlockInputStream}} -> {{ChunkInputStream}} -> datanode
> {{ReadChunk}}), so results reflect the real client read stack. It also
> separates client warmth (JIT + pooled datanode connections) from datanode
> page-cache effects, and {{-t}} drives concurrency
> to find where read throughput saturates.
>
>
>
>
>
> h2. Example
>
>
> {code}
>
>
> ozone freon ocklr -v <volume> -b <bucket> --key-file /tmp/keys.txt -t 8 -n
> 160
>
> {code}
>
>
>
>
>
> h2. Implementation notes
>
>
> * ~110 LOC in hadoop-ozone/tools, mirrors {{OzoneClientKeyValidator}};
> registered via {{@MetaInfServices(FreonSubcommand.class)}}. No new
> dependencies. Unit test for key-file parsing included.
> * Possible refinements from the discussion: per-key MB/s (mean ± stddev), a
> {{--buffer-size}} option, a thread-local read buffer, and/or routing the
> throughput summary through freon's standard
> report instead of a log line.
>
>
> * Naming ({{ocklr}}) follows the {{ockv}}/{{ockg}}/{{ocokr}} pattern; open
> to alternatives.
>
>
>
>
> Discussed and supported on the community forum:
> https://github.com/apache/ozone/discussions/10460
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
