[
https://issues.apache.org/jira/browse/HDDS-10465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ChenXi updated HDDS-10465:
--------------------------
Target Version/s: 2.0.0, 1.4.1
> Change ozone.client.bytes.per.checksum default to 16KB
> ------------------------------------------------------
>
> Key: HDDS-10465
> URL: https://issues.apache.org/jira/browse/HDDS-10465
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: Sammi Chen
> Assignee: Sammi Chen
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.0.0
>
> Attachments: image-2024-03-07-18-23-23-121.png
>
>
>
> When using TestDFSIO to compare the random read performance of HDFS and
> Ozone, Ozone is way more slow than HDFS. Here are the data tested in YCloud
> cluster.
> Test Suit: TestDFSIO
> Number of files: 64
> File Size: 1024MB
> ||Random read(execution time)||Round1(s)||Round2(s)||
> |HDFS| 47.06|49.5|
> |Ozone|147.31|149.47|
> And for Ozone itself, sequence read is must faster than random read:
> ||Ozone||Round1(s)||Round2(s)||Round3(s)||
> |read execution time|66.62|58.78|68.98|
> |random read
> execution time|147.31|149.47|147.09|
> While for HDFS, there is no much gap between its sequence read and random
> read execution time:
> ||HDFS||Round1(s)||Round2(s)||
> |read execution time|51.53|44.88|
> |random read
> execution time|47.06|49.5|
> After some investigation, it's found that the total bytes read from DN in
> TestDFSIO random read test is almost double the data size. Here the total
> data to read is 64 * 1024MB = 64GB, while the aggregated DN bytesReadChunk
> metric value is increased by 128GB after one test run. The root cause is when
> client reads data, it will align the requested data size with
> "ozone.client.bytes.per.checksum" which is 1MB currently. For example, if
> reading 1 byte, client will send request to DN to fetch 1MB data. If reading
> 2 bytes, but these 2 byte's offsets are cross the 1MB boundary, then client
> will send request to DN to fetch the first 1MB for first byte data, and the
> second 1MB for second byte data. In the random read mode, TestDFSIO use a
> read buffer with size 1000000 = 976.5KB, that's why the total bytes read from
> DN is double the size.
> According, HDFS uses property "file.bytes-per-checksum", which is 512 bytes
> by default.
> To improve the Ozone random read performance, a straightforward idea is to
> use a smaller "ozone.client.bytes.per.checksum" default value. Here we tested
> 1MB, 16KB and 8KB, get the data using TestDFSIO(64 files, each 512MB)
>
> ||ozone.client.bytes
> .per.checksum||write1(s)||write2(s)||write3(s)||read1(s)||read2(s)||read3(s)||read
> average||random
> read1||random
> read2||random
> read3||random
> average||
> |1MB|163.01|163.34|141.9|47.25|51.86|52.02|50.28|114.42|90.38|97.83|100.88|
> |16KB|160.6|144.43|165.08|63.36|67.68|69.94|66.89|55.94|72.14|55.43|61.17|
> |8KB|149.97|161.01|161.57|66.46|61.61|63.17|63.75|62.06|71.93|58.56|64.18|
>
> From the above data, we can see that for same amount of data
> * write, the execution time have no obvious differences in all there cases
> * sequential read, 1MB bytes.per.checksum has best execution time. 16KB and
> 8KB has the close execution time.
> * random read, 1MB has the worst execution time. 16KB and 8KB has the close
> execution time.
> * For either 16KB or 8KB bytes.per.checksum, their sequential read and
> random read has close execution time, similar to HDFS behavior.
> Change bytes.per.checksum from 1MB to 16KB, although the sequential read
> performance will drop a bit, but the performance gain in random read is much
> higher than that. Applications which leverage random read a lot, such as
> HBASE, Impala, Iceberg(Parquet) will all benefit from this.
> So this task propose to change the ozone.client.file.bytes-per-checksum
> default value from current 1MB to 16KB, and lower the current min limit of
> the property from 16KB to 8KB, to improve the overall read performance.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]