[ 
https://issues.apache.org/jira/browse/RATIS-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duong updated RATIS-1979:
-------------------------
    Description: 
With zero-copy ([https://github.com/apache/ratis/pull/990]), we rely on RaftLog 
cache eviction to release the zero-copy input streams.

There's a problems, with stateMachineCachingEnabled, the cache-size of 
LogEntryproto is calculated with StateMachineData trimmed. That leave the 
actual cached size if a log entry is too small comparing to the amount of 
direct memory backed for the StateMachineData. 

 
{code:java}
static long getEntrySize(LogEntryProto entry, Op op) {
  LogEntryProto e = entry;
  if (op == Op.CHECK_SEGMENT_FILE_FULL) {
    e = LogProtoUtils.removeStateMachineData(entry);
  } else if (op == Op.LOAD_SEGMENT_FILE || op == 
Op.WRITE_CACHE_WITH_STATE_MACHINE_CACHE) {
    Preconditions.assertTrue(entry == 
LogProtoUtils.removeStateMachineData(entry),
        () -> "Unexpected LogEntryProto with StateMachine data: op=" + op + ", 
entry=" + entry);
  } else {
    Preconditions.assertTrue(op == Op.WRITE_CACHE_WITHOUT_STATE_MACHINE_CACHE 
|| op == Op.REMOVE_CACHE,
        () -> "Unexpected op " + op + ", entry=" + entry);
  }
  final int serialized = e.getSerializedSize();
  return serialized + CodedOutputStream.computeUInt32SizeNoTag(serialized) + 4L;
} {code}
 

With the default 200MB limit for raft log cache, cache eviction is likely never 
happens until the server run out of direct memory for zero-copy.

 

Maybe for cache calculation, we should take the StateMachineData size into 
account.

  was:
Found this issue when running Ozone integ test with zero-copy. Ref CI: 
[https://github.com/duongkame/ozone/actions/runs/8655610655/job/23739252086]

Problem summary: on the follower side, the LogEntry's clientId buffer is reset 
(to all zero) while the log entry is kept in RaftLog cache. This cause an 
exception.
{code:java}
org.apache.ratis.thirdparty.com.google.common.util.concurrent.UncheckedExecutionException:
 java.lang.IllegalStateException: Failed to create ClientId: UUID 
00000000-0000-0000-0000-000000000000 is reserved.
    at 
org.apache.ratis.thirdparty.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2087)
    at 
org.apache.ratis.thirdparty.com.google.common.cache.LocalCache.get(LocalCache.java:4019)
    at 
org.apache.ratis.thirdparty.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4933)
    at org.apache.ratis.protocol.RaftId$Factory.valueOf(RaftId.java:64)
    at org.apache.ratis.protocol.RaftId$Factory.valueOf(RaftId.java:71)
    at org.apache.ratis.protocol.ClientId.valueOf(ClientId.java:45)
    at 
org.apache.ratis.protocol.ClientInvocationId.valueOf(ClientInvocationId.java:41)
    at 
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1825)
    at 
org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:259)
    at 
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:193)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Failed to create ClientId: UUID 
00000000-0000-0000-0000-000000000000 is reserved.
    at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:77)
    at org.apache.ratis.protocol.RaftId.<init>(RaftId.java:91)
    at org.apache.ratis.protocol.ClientId.<init>(ClientId.java:53)
    at org.apache.ratis.protocol.ClientId.<init>(ClientId.java:28)
    at org.apache.ratis.protocol.ClientId$1.newInstance(ClientId.java:32)
    at org.apache.ratis.protocol.ClientId$1.newInstance(ClientId.java:29)
    at org.apache.ratis.protocol.RaftId$Factory.lambda$valueOf$0(RaftId.java:64)
    at 
org.apache.ratis.thirdparty.com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4938)
    at 
org.apache.ratis.thirdparty.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3576)
    at 
org.apache.ratis.thirdparty.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2318)
    at 
org.apache.ratis.thirdparty.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2191)
    at 
org.apache.ratis.thirdparty.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2081)
 {code}
 

With zero-copy ([https://github.com/apache/ratis/pull/990]), we rely on RaftLog 
cache eviction to release the zero-copy input streams.

There's a problems, with stateMachineCachingEnabled, the cache-size of 
LogEntryproto is calculated with StateMachineData trimmed. That leave the 
actual cached size if a log entry is too small comparing to the amount of 
direct memory backed for the StateMachineData. 

 
{code:java}
static long getEntrySize(LogEntryProto entry, Op op) {
  LogEntryProto e = entry;
  if (op == Op.CHECK_SEGMENT_FILE_FULL) {
    e = LogProtoUtils.removeStateMachineData(entry);
  } else if (op == Op.LOAD_SEGMENT_FILE || op == 
Op.WRITE_CACHE_WITH_STATE_MACHINE_CACHE) {
    Preconditions.assertTrue(entry == 
LogProtoUtils.removeStateMachineData(entry),
        () -> "Unexpected LogEntryProto with StateMachine data: op=" + op + ", 
entry=" + entry);
  } else {
    Preconditions.assertTrue(op == Op.WRITE_CACHE_WITHOUT_STATE_MACHINE_CACHE 
|| op == Op.REMOVE_CACHE,
        () -> "Unexpected op " + op + ", entry=" + entry);
  }
  final int serialized = e.getSerializedSize();
  return serialized + CodedOutputStream.computeUInt32SizeNoTag(serialized) + 4L;
} {code}
 

With the default 200MB limit for raft log cache, cache eviction is likely never 
happens until the server run out of direct memory for zero-copy.

 

Maybe for cache calculation, we should take the StateMachineData size into 
account.


> Allow StateMachine.read to return a ReferentCountedObject
> ---------------------------------------------------------
>
>                 Key: RATIS-1979
>                 URL: https://issues.apache.org/jira/browse/RATIS-1979
>             Project: Ratis
>          Issue Type: Sub-task
>          Components: StateMachine
>            Reporter: Duong
>            Assignee: Duong
>            Priority: Major
>             Fix For: 3.1.0
>
>         Attachments: 1062_review.patch
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> With zero-copy ([https://github.com/apache/ratis/pull/990]), we rely on 
> RaftLog cache eviction to release the zero-copy input streams.
> There's a problems, with stateMachineCachingEnabled, the cache-size of 
> LogEntryproto is calculated with StateMachineData trimmed. That leave the 
> actual cached size if a log entry is too small comparing to the amount of 
> direct memory backed for the StateMachineData. 
>  
> {code:java}
> static long getEntrySize(LogEntryProto entry, Op op) {
>   LogEntryProto e = entry;
>   if (op == Op.CHECK_SEGMENT_FILE_FULL) {
>     e = LogProtoUtils.removeStateMachineData(entry);
>   } else if (op == Op.LOAD_SEGMENT_FILE || op == 
> Op.WRITE_CACHE_WITH_STATE_MACHINE_CACHE) {
>     Preconditions.assertTrue(entry == 
> LogProtoUtils.removeStateMachineData(entry),
>         () -> "Unexpected LogEntryProto with StateMachine data: op=" + op + 
> ", entry=" + entry);
>   } else {
>     Preconditions.assertTrue(op == Op.WRITE_CACHE_WITHOUT_STATE_MACHINE_CACHE 
> || op == Op.REMOVE_CACHE,
>         () -> "Unexpected op " + op + ", entry=" + entry);
>   }
>   final int serialized = e.getSerializedSize();
>   return serialized + CodedOutputStream.computeUInt32SizeNoTag(serialized) + 
> 4L;
> } {code}
>  
> With the default 200MB limit for raft log cache, cache eviction is likely 
> never happens until the server run out of direct memory for zero-copy.
>  
> Maybe for cache calculation, we should take the StateMachineData size into 
> account.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to