[ 
https://issues.apache.org/jira/browse/RATIS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691921#comment-17691921
 ] 

Xinyu Tan edited comment on RATIS-1787 at 2/22/23 4:45 AM:
-----------------------------------------------------------

Yes, if IoTDB really needs to send the Snapshot, it can currently calculate md5 
when streaming the snapshot. It does not need to read the file to calculate md5 
when generating the snapshot. In most cases, the snapshot transfer is not 
required. So the work of 
[RATIS-1597|https://issues.apache.org/jira/browse/RATIS-1597] makes a lot of 
sense for reducing the overhead of takeSnapshot.

Maybe we can have different policies for different security, such as fileSize 
policy, md5 policy, etc.


was (Author: tanxinyu):
Yes, if IoTDB really needs to send the Snapshot, it can currently calculate md5 
when streaming the snapshot. It does not need to read the file to calculate md5 
when generating the snapshot. In most cases, the snapshot transfer is not 
required. So the work of RATIS-1597 makes a lot of sense for reducing the 
overhead of takeSnapshot.

Maybe we can have different policies for different security, such as fileSize 
policy, md5 policy, etc.

> Don't  generate md5 file for each file when a file is received during 
> InstallSnapshot to reduce the pressure on the file system
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: RATIS-1787
>                 URL: https://issues.apache.org/jira/browse/RATIS-1787
>             Project: Ratis
>          Issue Type: Improvement
>            Reporter: Xinyu Tan
>            Assignee: Xinyu Tan
>            Priority: Major
>
> The IoTDB community has experienced a number of performance issues with MD5 
> calculations when using the Snapshot feature.
> Originally, MD5 calculation is to prevent errors in file transmission. 
> However, in the previous implementation, when we generate a Snapshot, we need 
> to load all files to calculate the snapshot, even though we do not need to 
> send snapshot at this time, which causes a lot of resource preemption in the 
> background. [~William Song] in RATIS-1597 has combined client-side MD5 
> calculations with streaming, thus avoiding a lot of background IO and 
> computing tasks. However, at the Snapshot receiver, one MD5 file is still 
> currently stored for each file. At present, the snapshot file level of IoTDB 
> may be tens of thousands. These tens of thousands of small md5 files cause a 
> lot of pressure on the file system. Is it possible that we do not store an 
> md5 file for each file when receiving?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to