[
https://issues.apache.org/jira/browse/IOTDB-4652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621006#comment-17621006
]
Song Ziyang commented on IOTDB-4652:
------------------------------------
[~xingtanzjr] [~SpriCoder] The direct memory leak is a known issue for
Ratis2.3.0, see https://issues.apache.org/jira/browse/IOTDB-4509This leak error
*won't affect the correctness,* and the leaked memory are small, won't affect
other modules.
Also, this error is fixed by Ratis 2.4.0. The latest master branch shall see
the leak error eliminated.
> [ MultiLeaderConsensus ] The data on the replicas is inconsistent
> -----------------------------------------------------------------
>
> Key: IOTDB-4652
> URL: https://issues.apache.org/jira/browse/IOTDB-4652
> Project: Apache IoTDB
> Issue Type: Bug
> Components: mpp-cluster
> Affects Versions: 0.14.0-SNAPSHOT
> Reporter: 刘珍
> Assignee: 张洪胤
> Priority: Major
> Attachments: image-2022-10-14-16-04-28-847.png,
> image-2022-10-14-16-13-37-165.png
>
>
> {color:#DE350B}colored text{color}master_1013_00dc222
> schema : ratis
> data : multiLeader
> 3副本,3C3D
> bm写入完成(显示全成功),flush。
> 查询数据,{color:#DE350B}*副本间数据不一致*{color}。
> 查询ip68(最后的状态:此DataRegion[66]的leader),
> ./sbin/start-cli.sh -h 192.168.10.68 -e "select count(s_0) from
> root.test.g_13.d_1013"
> 少了6个点数据
> !image-2022-10-14-16-04-28-847.png!
> 分析ip68/ip62/ip66 此root.test.g_13.d_1013设备的数据
> ip68:94个点,少6个点
> ip62:100个点,正确
> ip66:100个点,正确
> ip66做过leader(直接写入数据较少),ip66
> 往ip68同步此region的数据时,有ERROR({color:#DE350B}*疑问:如果有不可避免的同步失败,后续还会同步吗*{color}):
> 2022-10-14 10:55:02,593 [pool-96-IoTDB-LogDispatcher-DataRegion[66]-2] ERROR
> o.a.i.c.m.l.LogDispatcher$LogDispatcherThread:415 - Can not sync logs to peer
> Peer{groupId=DataRegion[66], endpoint=TEndPoint(ip:192.168.10.68,
> port:40010)} because
> java.io.IOException: Borrow client from pool for node
> TEndPoint(ip:192.168.10.68, port:40010) failed.
> at
> org.apache.iotdb.commons.client.ClientManager.borrowClient(ClientManager.java:61)
> at
> org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.sendBatchAsync(LogDispatcher.java:404)
> at
> org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.run(LogDispatcher.java:289)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.NoSuchElementException: Timeout waiting for idle object,
> borrowMaxWaitMillis=10000
> at
> org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:453)
> at
> org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:350)
> at
> org.apache.iotdb.commons.client.ClientManager.borrowClient(ClientManager.java:50)
> ... 7 common frames omitted
> 还需要注意ip66有个ratis 堆外内存检测到泄露的error
> 2022-10-14 10:39:26,022 [grpc-default-worker-ELG-3-40] ERROR
> o.a.r.t.i.n.u.ResourceLeakDetector:319 - LEAK: ByteBuf.release() was not
> called before it's garbage-collected. See
> https://netty.io/wiki/reference-counted-objects.html for more information.
> Recent access records:
> Created at:
>
> org.apache.ratis.thirdparty.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:401)
>
> org.apache.ratis.thirdparty.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
>
> org.apache.ratis.thirdparty.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179)
>
> org.apache.ratis.thirdparty.io.netty.channel.unix.PreferredDirectByteBufAllocator.ioBuffer(PreferredDirectByteBufAllocator.java:53)
>
> org.apache.ratis.thirdparty.io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:120)
>
> org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollRecvByteAllocatorHandle.allocate(EpollRecvByteAllocatorHandle.java:75)
>
> org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:780)
>
> org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>
> org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
>
> org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>
> org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> java.lang.Thread.run(Thread.java:748)
> 测试环境
> 1. 192.168.10.62/66/68 物理机 72cpu 256GB
> bm在ip64 配置见附件
> ConfigNode
> MAX_HEAP_SIZE="16G"
> MAX_DIRECT_MEMORY_SIZE="8G"
>
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=3
> data_replication_factor=3
> connection_timeout_ms=1200000
> DataNode
> MAX_HEAP_SIZE="192G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> connection_timeout_ms=1200000
> max_waiting_time_when_insert_blocked=3600000
> query_timeout_threshold=36000000
> enable_auto_create_schema=false
> 2. bm写入
> 配置见附件
> !image-2022-10-14-16-13-37-165.png!
> 3. 查询,验证数据正确性,分析结果,分析集群日志。
--
This message was sent by Atlassian Jira
(v8.20.10#820010)