[ 
https://issues.apache.org/jira/browse/RATIS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295705#comment-17295705
 ] 

runzhiwang edited comment on RATIS-1312 at 3/5/21, 2:58 AM:
------------------------------------------------------------

[~szetszwo]  Following is the test result on 3 datanodes cluster without 
writting disk, when change files number and file size.

1. file number is 6400, file size is 8MB, buffer size is 2MB

hdfs fails to finish, streaming costs 86 seconds

2. file number is 4096, file size is 8MB, buffer size is 2MB 

hdfs fails to finish, streaming cost 60 seconds.

3. file number is 2048, file size is 8MB, buffer size is 2MB

hdfs costs 22 seconds, streaming costs 24 seconds.

4. file number is 2048, file size is 16MB, buffer size is 2MB

hdfs costs 37 seconds, streaming cost 56 seconds

5. file number is 2048, file size is 64MB, buffer size is 2MB

hdfs costs 125 seconds, streaming costs 201 seconds.

6. file number is 1024, file size is 128MB, buffer size is 2MB

hdfs costs 120 seconds, streaming cost 203 seconds.

Conclusion:
1. when write more than 4096 files, hdfs will fail, but streaming succeed
2. when write small size file, such as 8MB file, streaming's performance is 
similar to hdfs
3. when write big size file, more than 8MB file, streaming is slower than hdfs, 
streaming's cost is as 1.66 times as hdfs.

In my thinking, ozone mostly write big size file, default 128MB block, and 
datanode mostly will not write more than 4096 files at the same time. So 
streaming & ozone can out outperform hdfs.
 
As you said, HDFS allocate one thread and one socket for each block, so HDFS 
can stream write packet to disk and send packet to other datanode at size 60KB. 
  Besides, streaming use one socket for all blocks, so each buffer size message 
need to mark which block it belongs to, decode and encode proto also cost time. 
  For these two reasons, streaming is slower than HDFS.

Maybe we can use hdfs's write in Ozone ?  If the thread number is the problem, 
maybe we can use one thread to manage a lot of sockets.



was (Author: yjxxtd):
[~szetszwo]  Following is the test result on 3 datanodes cluster without 
writting disk, when change files number and file size.

1. file number is 6400, file size is 8MB, buffer size is 2MB

hdfs fails to finish, streaming costs 86 seconds

2. file number is 4096, file size is 8MB, buffer size is 2MB 

hdfs fails to finish, streaming cost 60 seconds.

3. file number is 2048, file size is 8MB, buffer size is 2MB

hdfs costs 22 seconds, streaming costs 24 seconds.

4. file number is 2048, file size is 16MB, buffer size is 2MB

hdfs costs 37 seconds, streaming cost 56 seconds

5. file number is 2048, file size is 64MB, buffer size is 2MB

hdfs costs 125 seconds, streaming costs 201 seconds.

6. file number is 1024, file size is 128MB, buffer size is 2MB

hdfs costs 120 seconds, streaming cost 203 seconds.

Conclusion:
1. when write more than 4096 files, hdfs will fail, but streaming succeed
2. when write small size file, such as 8MB file, streaming's performance is 
similar to hdfs
3. when write big size file, more than 8MB file, streaming is slower than hdfs, 
streaming's cost is as 1.66 times as hdfs.

In my thinking, ozone mostly write big size file, default 128MB block, and 
datanode mostly will not write more than 4096 files at the same time. So 
streaming & ozone can out outperform hdfs.
 
As you said, HDFS allocate one thread and one socket for each block, so HDFS 
can stream write packet to disk and send packet to other datanode at size 60KB. 
  Besides, streaming use one socket for all blocks, so each buffer size message 
need to mark which block it belongs to, decode and encode also cost time.   For 
these two reasons, streaming is slower than HDFS.

Maybe we can use hdfs's write in Ozone ?  If the thread number is the problem, 
maybe we can use one thread to manage a lot of sockets.


> Compare the performance between HDFS and DataStreamApi
> ------------------------------------------------------
>
>                 Key: RATIS-1312
>                 URL: https://issues.apache.org/jira/browse/RATIS-1312
>             Project: Ratis
>          Issue Type: Sub-task
>            Reporter: runzhiwang
>            Priority: Major
>         Attachments: hdfs.svg, screenshot-1.png, streaming.svg
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to