lwllvyb commented on PR #2106: URL: https://github.com/apache/incubator-uniffle/pull/2106#issuecomment-2340020080
I add some debug log to veryfy if it has duplicated blockId.  And i got some logs: ``` [2024-09-09 23:51:03.422] [epollEventLoopGroup-3-29] [WARN] AbstractShuffleBuffer - append partitionId=8920 blockId=37413336496 is duplicated, prev block=ShufflePartitionedBlock{blockId[37413336496], length[11896], size[11928], uncompressLength[21560], crc[2412607028], taskAttemptId[144816]} [2024-09-09 23:51:03.543] [Grpc-140] [ERROR] ShuffleServerGrpcService - Error happened when get shuffle result for appId[application_1703049085550_20299615_1725895467752], shuffleId[0], partitions[8920] org.apache.uniffle.common.exception.RssException: Inconsistent block number for partitions: [8920]. Excepted: 43995, actual: 43996 at org.apache.uniffle.server.ShuffleTaskManager.getFinishedBlockIds(ShuffleTaskManager.java:656) at org.apache.uniffle.server.ShuffleServerGrpcService.getShuffleResultForMultiPart(ShuffleServerGrpcService.java:939) at org.apache.uniffle.proto.ShuffleServerGrpc$MethodHandlers.invoke(ShuffleServerGrpc.java:1056) at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35) at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23) at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40) at org.apache.uniffle.common.rpc.ClientContextServerInterceptor$1.onHalfClose(ClientContextServerInterceptor.java:63) at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35) at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23) at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:356) at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:861) at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) ``` From these logs, the cause might be that duplicate blockId is added to the same buffer in bufferPool, and the previously added block with the same blockId is replaced, resulting in an untraceable memory leal. @zuston -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
