[
https://issues.apache.org/jira/browse/RATIS-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duong updated RATIS-1979:
-------------------------
Description:
With zero-copy ([https://github.com/apache/ratis/pull/990]), we rely on RaftLog
cache eviction to release the zero-copy input streams.
There's a problems, with stateMachineCachingEnabled, the cache-size of
LogEntryproto is calculated with StateMachineData trimmed. That leave the
actual cached size if a log entry is too small comparing to the amount of
direct memory backed for the StateMachineData.
{code:java}
static long getEntrySize(LogEntryProto entry, Op op) {
LogEntryProto e = entry;
if (op == Op.CHECK_SEGMENT_FILE_FULL) {
e = LogProtoUtils.removeStateMachineData(entry);
} else if (op == Op.LOAD_SEGMENT_FILE || op ==
Op.WRITE_CACHE_WITH_STATE_MACHINE_CACHE) {
Preconditions.assertTrue(entry ==
LogProtoUtils.removeStateMachineData(entry),
() -> "Unexpected LogEntryProto with StateMachine data: op=" + op + ",
entry=" + entry);
} else {
Preconditions.assertTrue(op == Op.WRITE_CACHE_WITHOUT_STATE_MACHINE_CACHE
|| op == Op.REMOVE_CACHE,
() -> "Unexpected op " + op + ", entry=" + entry);
}
final int serialized = e.getSerializedSize();
return serialized + CodedOutputStream.computeUInt32SizeNoTag(serialized) + 4L;
} {code}
With the default 200MB limit for raft log cache, cache eviction is likely never
happens until the server run out of direct memory for zero-copy.
Maybe for cache calculation, we should take the StateMachineData size into
account.
was:
Found this issue when running Ozone integ test with zero-copy. Ref CI:
[https://github.com/duongkame/ozone/actions/runs/8655610655/job/23739252086]
Problem summary: on the follower side, the LogEntry's clientId buffer is reset
(to all zero) while the log entry is kept in RaftLog cache. This cause an
exception.
{code:java}
org.apache.ratis.thirdparty.com.google.common.util.concurrent.UncheckedExecutionException:
java.lang.IllegalStateException: Failed to create ClientId: UUID
00000000-0000-0000-0000-000000000000 is reserved.
at
org.apache.ratis.thirdparty.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2087)
at
org.apache.ratis.thirdparty.com.google.common.cache.LocalCache.get(LocalCache.java:4019)
at
org.apache.ratis.thirdparty.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4933)
at org.apache.ratis.protocol.RaftId$Factory.valueOf(RaftId.java:64)
at org.apache.ratis.protocol.RaftId$Factory.valueOf(RaftId.java:71)
at org.apache.ratis.protocol.ClientId.valueOf(ClientId.java:45)
at
org.apache.ratis.protocol.ClientInvocationId.valueOf(ClientInvocationId.java:41)
at
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1825)
at
org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:259)
at
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:193)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Failed to create ClientId: UUID
00000000-0000-0000-0000-000000000000 is reserved.
at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:77)
at org.apache.ratis.protocol.RaftId.<init>(RaftId.java:91)
at org.apache.ratis.protocol.ClientId.<init>(ClientId.java:53)
at org.apache.ratis.protocol.ClientId.<init>(ClientId.java:28)
at org.apache.ratis.protocol.ClientId$1.newInstance(ClientId.java:32)
at org.apache.ratis.protocol.ClientId$1.newInstance(ClientId.java:29)
at org.apache.ratis.protocol.RaftId$Factory.lambda$valueOf$0(RaftId.java:64)
at
org.apache.ratis.thirdparty.com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4938)
at
org.apache.ratis.thirdparty.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3576)
at
org.apache.ratis.thirdparty.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2318)
at
org.apache.ratis.thirdparty.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2191)
at
org.apache.ratis.thirdparty.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2081)
{code}
With zero-copy ([https://github.com/apache/ratis/pull/990]), we rely on RaftLog
cache eviction to release the zero-copy input streams.
There's a problems, with stateMachineCachingEnabled, the cache-size of
LogEntryproto is calculated with StateMachineData trimmed. That leave the
actual cached size if a log entry is too small comparing to the amount of
direct memory backed for the StateMachineData.
{code:java}
static long getEntrySize(LogEntryProto entry, Op op) {
LogEntryProto e = entry;
if (op == Op.CHECK_SEGMENT_FILE_FULL) {
e = LogProtoUtils.removeStateMachineData(entry);
} else if (op == Op.LOAD_SEGMENT_FILE || op ==
Op.WRITE_CACHE_WITH_STATE_MACHINE_CACHE) {
Preconditions.assertTrue(entry ==
LogProtoUtils.removeStateMachineData(entry),
() -> "Unexpected LogEntryProto with StateMachine data: op=" + op + ",
entry=" + entry);
} else {
Preconditions.assertTrue(op == Op.WRITE_CACHE_WITHOUT_STATE_MACHINE_CACHE
|| op == Op.REMOVE_CACHE,
() -> "Unexpected op " + op + ", entry=" + entry);
}
final int serialized = e.getSerializedSize();
return serialized + CodedOutputStream.computeUInt32SizeNoTag(serialized) + 4L;
} {code}
With the default 200MB limit for raft log cache, cache eviction is likely never
happens until the server run out of direct memory for zero-copy.
Maybe for cache calculation, we should take the StateMachineData size into
account.
> Allow StateMachine.read to return a ReferentCountedObject
> ---------------------------------------------------------
>
> Key: RATIS-1979
> URL: https://issues.apache.org/jira/browse/RATIS-1979
> Project: Ratis
> Issue Type: Sub-task
> Components: StateMachine
> Reporter: Duong
> Assignee: Duong
> Priority: Major
> Fix For: 3.1.0
>
> Attachments: 1062_review.patch
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> With zero-copy ([https://github.com/apache/ratis/pull/990]), we rely on
> RaftLog cache eviction to release the zero-copy input streams.
> There's a problems, with stateMachineCachingEnabled, the cache-size of
> LogEntryproto is calculated with StateMachineData trimmed. That leave the
> actual cached size if a log entry is too small comparing to the amount of
> direct memory backed for the StateMachineData.
>
> {code:java}
> static long getEntrySize(LogEntryProto entry, Op op) {
> LogEntryProto e = entry;
> if (op == Op.CHECK_SEGMENT_FILE_FULL) {
> e = LogProtoUtils.removeStateMachineData(entry);
> } else if (op == Op.LOAD_SEGMENT_FILE || op ==
> Op.WRITE_CACHE_WITH_STATE_MACHINE_CACHE) {
> Preconditions.assertTrue(entry ==
> LogProtoUtils.removeStateMachineData(entry),
> () -> "Unexpected LogEntryProto with StateMachine data: op=" + op +
> ", entry=" + entry);
> } else {
> Preconditions.assertTrue(op == Op.WRITE_CACHE_WITHOUT_STATE_MACHINE_CACHE
> || op == Op.REMOVE_CACHE,
> () -> "Unexpected op " + op + ", entry=" + entry);
> }
> final int serialized = e.getSerializedSize();
> return serialized + CodedOutputStream.computeUInt32SizeNoTag(serialized) +
> 4L;
> } {code}
>
> With the default 200MB limit for raft log cache, cache eviction is likely
> never happens until the server run out of direct memory for zero-copy.
>
> Maybe for cache calculation, we should take the StateMachineData size into
> account.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)