runzhiwang opened a new pull request #317:
URL: https://github.com/apache/incubator-ratis/pull/317


   ## What changes were proposed in this pull request?
   
   **What's the problem ?**
   
   Use FileStore write 1GB file, OOM happens.
   
   **How to reproduce ?**
   1.  change timeout 100 seconds to 600 seconds.
     public int getGlobalTimeoutSeconds() {
       return 100;
     }
   
   2. change 1M to 50M
   
   testMultipleFiles("file", 20, SizeInBytes.valueOf("1M"), newClient);
   
   **What's the reason ?**
   
   1. In FileStore, `CACHING_ENABLED_DEFAULT`, i.e. 
`stateMachineCachingEnabled` is false,  so cache will append the complete 
LogEntryProto into cache, without remove StateMachineData.
   ```
         if (stateMachineCachingEnabled) {
           // The stateMachineData will be cached inside the StateMachine 
itself.
           cache.appendEntry(ServerProtoUtils.removeStateMachineData(entry));
         } else {
           cache.appendEntry(entry);
         }
   ```
   
   2. But when getEntrySize, it does not calculate the size of StateMachineData
   ```
     static long getEntrySize(LogEntryProto entry) {
       final int serialized = 
ServerProtoUtils.removeStateMachineData(entry).getSerializedSize();
       return serialized + CodedOutputStream.computeUInt32SizeNoTag(serialized) 
+ 4L;
     }
   ```
   
   3. So `isSegmentFull` is false, and can not evict cache, then 
LogSegment#entryCache keep too many LogEntryProto, which contains the 
StateMachineData, so OOM happens.
   
   if (isSegmentFull(currentOpenSegment, entry)) {
           cache.rollOpenSegment(true);
           fileLogWorker.rollLogSegment(currentOpenSegment);
           checkAndEvictCache();
   }
   
   **How to fix ?**
   If `stateMachineCachingEnabled` is false,  we calculate entry size without 
removeStateMachineData.
   ```
     static long getEntrySize(LogEntryProto entry, boolean 
stateMachineCachingEnabled) {
       final int serialized = stateMachineCachingEnabled ?
           ServerProtoUtils.removeStateMachineData(entry).getSerializedSize() :
           entry.getSerializedSize();
       return serialized + CodedOutputStream.computeUInt32SizeNoTag(serialized) 
+ 4L;
     }
   ```
   
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/RATIS-1198
   
   ## How was this patch tested?
   
   test it manually
   
   1.  change timeout 100 seconds to 600 seconds.
     public int getGlobalTimeoutSeconds() {
       return 100;
     }
   
   2. change 1M to 50M
   
   testMultipleFiles("file", 20, SizeInBytes.valueOf("1M"), newClient);


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to