Song Ziyang created RATIS-1587:
----------------------------------

             Summary: InstallSnapshot fails when snapshot has multiple chunks
                 Key: RATIS-1587
                 URL: https://issues.apache.org/jira/browse/RATIS-1587
             Project: Ratis
          Issue Type: Bug
          Components: server
    Affects Versions: 2.3.0
            Reporter: Song Ziyang
         Attachments: image-2022-05-31-08-56-35-373.png, 
image-2022-05-31-08-58-44-717.png, image-2022-05-31-09-00-41-286.png

*Bug Description*

Currently, when follower install snapshot from leader, leader will divide 
snapshot files into a sequence of fixed size chunks (16MB) and send each 
through rpc.

!image-2022-05-31-08-56-35-373.png!

only the last rpc request in the sequence is tagged with 'Done'.

!image-2022-05-31-08-58-44-717.png!

However, when follower handles these sequence of rpcs, it will create *a random 
temp dir for each rpc request, store the chunk in, and only move the last chunk 
from tmp dir to sm dir.*

*!image-2022-05-31-09-00-41-286.png!*

Thus, ** when snapshot contains multiple files or a single file larger than 
16MB, ** InstallSnapshot will fail because only last chunk is stored in the /sm 
dir and others are remained in many tmp dirs.

 

*How To Fix*

Instead of use random uuid to name tmp dir every time, it is possible to use 
the request-id  to name the tmp dir. request-id is produced by leader and is 
shared among the sequence of requests.

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to