runzhiwang opened a new pull request #317:
URL: https://github.com/apache/incubator-ratis/pull/317
## What changes were proposed in this pull request?
**What's the problem ?**
Use FileStore write 1GB file, OOM happens.
**How to reproduce ?**
1. change timeout 100 seconds to 600 seconds.
public int getGlobalTimeoutSeconds() {
return 100;
}
2. change 1M to 50M
testMultipleFiles("file", 20, SizeInBytes.valueOf("1M"), newClient);
**What's the reason ?**
1. In FileStore, `CACHING_ENABLED_DEFAULT`, i.e.
`stateMachineCachingEnabled` is false, so cache will append the complete
LogEntryProto into cache, without remove StateMachineData.
```
if (stateMachineCachingEnabled) {
// The stateMachineData will be cached inside the StateMachine
itself.
cache.appendEntry(ServerProtoUtils.removeStateMachineData(entry));
} else {
cache.appendEntry(entry);
}
```
2. But when getEntrySize, it does not calculate the size of StateMachineData
```
static long getEntrySize(LogEntryProto entry) {
final int serialized =
ServerProtoUtils.removeStateMachineData(entry).getSerializedSize();
return serialized + CodedOutputStream.computeUInt32SizeNoTag(serialized)
+ 4L;
}
```
3. So `isSegmentFull` is false, and can not evict cache, then
LogSegment#entryCache keep too many LogEntryProto, which contains the
StateMachineData, so OOM happens.
if (isSegmentFull(currentOpenSegment, entry)) {
cache.rollOpenSegment(true);
fileLogWorker.rollLogSegment(currentOpenSegment);
checkAndEvictCache();
}
**How to fix ?**
If `stateMachineCachingEnabled` is false, we calculate entry size without
removeStateMachineData.
```
static long getEntrySize(LogEntryProto entry, boolean
stateMachineCachingEnabled) {
final int serialized = stateMachineCachingEnabled ?
ServerProtoUtils.removeStateMachineData(entry).getSerializedSize() :
entry.getSerializedSize();
return serialized + CodedOutputStream.computeUInt32SizeNoTag(serialized)
+ 4L;
}
```
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/RATIS-1198
## How was this patch tested?
test it manually
1. change timeout 100 seconds to 600 seconds.
public int getGlobalTimeoutSeconds() {
return 100;
}
2. change 1M to 50M
testMultipleFiles("file", 20, SizeInBytes.valueOf("1M"), newClient);
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]