[ 
https://issues.apache.org/jira/browse/HDDS-15586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksei Ieshin updated HDDS-15586:
----------------------------------
    Status: Patch Available  (was: In Progress)

> Add freon command to read a user-supplied list of existing keys  
> -----------------------------------------------------------------
>
>                 Key: HDDS-15586
>                 URL: https://issues.apache.org/jira/browse/HDDS-15586
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: freon
>            Reporter: Aleksei Ieshin
>            Assignee: Aleksei Ieshin
>            Priority: Major
>              Labels: pull-request-available
>
>   h2. Problem                                                                 
>                                                                               
>                                          
>   freon's client read generators can only read keys they themselves 
> generated:                                                                    
>                                                    
>   * {{ockg}}/{{ockv}} use prefix+index naming, and {{ockv}} validates every 
> read against key-0's digest (assumes all keys have identical content).        
>                                            
>   * {{SameKeyReader}} ({{ocokr}}) reads one fixed key from many threads.      
>                                                                               
>                                          
>                                                                               
>                                                                               
>                                          
>   There is no freon command that points at an arbitrary, heterogeneous set of 
> existing keys (a real dataset already in a bucket) and measures read 
> throughput. This is needed for read-path          
>   performance and capacity/scaling work, where freshly generated uniform keys 
> are page-cache-hot and not representative of production data.                 
>                                          
>                                                                               
>                                                                               
>                                          
>   h2. Proposed change                                                         
>                                                                               
>                                          
>   Add a freon subcommand {{OzoneClientKeyListReader}} ({{ocklr}}) that:       
>                                                                               
>                                          
>   * takes {{--key-file <path>}} — a local file with one key name per line; 
> blank lines and {{#}} comments ignored;                                       
>                                             
>   * reuses {{BaseFreonGenerator}} — a warm shared {{OzoneClient}}, {{-t}} 
> threads, {{-n}} total reads (task i reads keys[i % keys.size()], so {{-n}} 
> loops the list), DropWizard timer;              
>   * per read calls {{bucket.readKey(key)}}, drains the stream into a fixed 
> buffer and counts bytes (no content/digest assumptions); reports the 
> {{key-read}} timer plus an aggregate bytes/wall-time 
>   MB/s line.                                                                  
>                                                                               
>                                          
>                                                                               
>                                                                               
>                                          
>   It exercises the same end-to-end read path as {{ozone sh key get}} and the 
> FileSystem {{open()}} ({{readKey}} -> {{KeyInputStream}} -> 
> {{BlockInputStream}} -> {{ChunkInputStream}} -> datanode    
>   {{ReadChunk}}), so results reflect the real client read stack. It also 
> separates client warmth (JIT + pooled datanode connections) from datanode 
> page-cache effects, and {{-t}} drives concurrency 
>   to find where read throughput saturates.                                    
>                                                                               
>                                          
>                                                                               
>                                                                               
>                                          
>   h2. Example                                                                 
>                                                                               
>                                          
>   {code}                                                                      
>                                                                               
>                                          
>   ozone freon ocklr -v <volume> -b <bucket> --key-file /tmp/keys.txt -t 8 -n 
> 160                                                                           
>                                           
>   {code}                                                                      
>                                                                               
>                                          
>                                                                               
>                                                                               
>                                          
>   h2. Implementation notes                                                    
>                                                                               
>                                          
>   * ~110 LOC in hadoop-ozone/tools, mirrors {{OzoneClientKeyValidator}}; 
> registered via {{@MetaInfServices(FreonSubcommand.class)}}. No new 
> dependencies. Unit test for key-file parsing included.   
>   * Possible refinements from the discussion: per-key MB/s (mean ± stddev), a 
> {{--buffer-size}} option, a thread-local read buffer, and/or routing the 
> throughput summary through freon's standard   
>   report instead of a log line.                                               
>                                                                               
>                                          
>   * Naming ({{ocklr}}) follows the {{ockv}}/{{ockg}}/{{ocokr}} pattern; open 
> to alternatives.                                                              
>                                           
>                                                                               
>                                                                               
>                                          
>   Discussed and supported on the community forum: 
> https://github.com/apache/ozone/discussions/10460



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to