[ 
https://issues.apache.org/jira/browse/HDDS-10465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenXi updated HDDS-10465:
--------------------------
    Target Version/s: 2.0.0, 1.4.1

> Change ozone.client.bytes.per.checksum default to 16KB
> ------------------------------------------------------
>
>                 Key: HDDS-10465
>                 URL: https://issues.apache.org/jira/browse/HDDS-10465
>             Project: Apache Ozone
>          Issue Type: Improvement
>            Reporter: Sammi Chen
>            Assignee: Sammi Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.0.0
>
>         Attachments: image-2024-03-07-18-23-23-121.png
>
>
>  
> When using TestDFSIO to compare the random read performance of HDFS and 
> Ozone, Ozone is way more slow than HDFS. Here are the data tested in YCloud 
> cluster.
> Test Suit: TestDFSIO
> Number of files: 64
> File Size: 1024MB
> ||Random read(execution time)||Round1(s)||Round2(s)||
> |HDFS| 47.06|49.5|
> |Ozone|147.31|149.47|
> And for Ozone itself, sequence read is must faster than random read:
> ||Ozone||Round1(s)||Round2(s)||Round3(s)||
> |read execution time|66.62|58.78|68.98|
> |random read 
> execution time|147.31|149.47|147.09|
> While for HDFS, there is no much gap between its sequence read and random 
> read execution time:
> ||HDFS||Round1(s)||Round2(s)||
> |read execution time|51.53|44.88|
> |random read 
> execution time|47.06|49.5|
> After some investigation, it's found that the total bytes read from DN in 
> TestDFSIO random read test is almost double the data size. Here the total 
> data to read is 64 * 1024MB = 64GB, while the aggregated DN bytesReadChunk 
> metric value is increased by 128GB after one test run. The root cause is when 
> client reads data, it will align the requested data size with 
> "ozone.client.bytes.per.checksum" which is 1MB currently.  For example, if 
> reading 1 byte, client will send request to DN to fetch 1MB data. If reading 
> 2 bytes, but these 2 byte's offsets are cross the 1MB boundary, then client 
> will send request to DN to fetch the first 1MB for first byte data, and the 
> second 1MB for second byte data. In the random read mode, TestDFSIO use a 
> read buffer with size 1000000 = 976.5KB, that's why the total bytes read from 
> DN is double the size.
> According, HDFS uses property "file.bytes-per-checksum", which is 512 bytes 
> by default.
> To improve the Ozone random read performance, a straightforward idea is to 
> use a smaller "ozone.client.bytes.per.checksum" default value. Here we tested 
> 1MB, 16KB and 8KB, get the data using TestDFSIO(64 files, each 512MB)
>  
> ||ozone.client.bytes
> .per.checksum||write1(s)||write2(s)||write3(s)||read1(s)||read2(s)||read3(s)||read
> average||random
> read1||random
> read2||random
> read3||random
> average||
> |1MB|163.01|163.34|141.9|47.25|51.86|52.02|50.28|114.42|90.38|97.83|100.88|
> |16KB|160.6|144.43|165.08|63.36|67.68|69.94|66.89|55.94|72.14|55.43|61.17|
> |8KB|149.97|161.01|161.57|66.46|61.61|63.17|63.75|62.06|71.93|58.56|64.18|
>  
> From the above data, we can see that for same amount of data
>  * write, the execution time have no obvious differences in all there cases
>  * sequential read, 1MB bytes.per.checksum has best execution time.  16KB and 
> 8KB has the close execution time.
>  * random read, 1MB has the worst execution time. 16KB and 8KB has the close 
> execution time.
>  * For either 16KB or 8KB bytes.per.checksum, their sequential read and 
> random read has close execution time, similar to HDFS behavior.
> Change bytes.per.checksum from 1MB to 16KB, although the sequential read 
> performance will drop a bit, but the performance gain in random read is much 
> higher than that. Applications which leverage random read a lot, such as 
> HBASE, Impala, Iceberg(Parquet) will all benefit from this.  
> So this task propose to change the ozone.client.file.bytes-per-checksum 
> default value from current 1MB to 16KB, and lower the current min limit of 
> the property from 16KB to 8KB, to improve the overall read performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to