funky-eyes commented on issue #7004: URL: https://github.com/apache/incubator-seata/issues/7004#issuecomment-2482031263
#7005 修复了并发出现的raft npe的问题,而二阶段的重试和决议可能会同时进行的问题还没处理 原方案 1. 增加本地锁,该方案在存算一体的raft和file下可解决,但是由于这种低概率事件而悲观的上锁,会导致不必要的性能损耗,并且在db和redis下依然无效 方案2. 增加动态的事务补偿时间,长事务的deadtime(认为事务在rollbaking和committing状态发生异常等需要异步任务进行补偿的时间间隔,默认2分10秒)可以通过globaltransactional注解进行指定每个事务粒度级别的deadtime,避免并发。(目前有全局可配置的server.retryDeadThreshold进行配置,但是粒度不够细),但是该方案缺点就是在db存储模式下需要增加表列,用户必须按照新的表接口变更后再升级server 方案3:共识算法,raft+db/redis等其它存储模式,来感知决议节点是否已下线,补偿任务仅补偿对应xid的server已经下线的rollbacking状态的事务,因为对应xid的server在线,不应该再进行补偿该事务,因为如果同步的过程中出现异常,事务会changestatus,并不会保持在rollbacking,也就是如果xid对应的server存活,rollbacking只可能在存活节点上正在运行,而不需要补偿。 #7005 fixed the Raft NPE issue caused by concurrency, but the issue where two-phase retries and decisions might occur simultaneously has not been addressed yet. Original Plan: Add local locks: This solution resolves the issue in Raft with integrated storage and computation (such as Raft and file systems), but introducing pessimistic locking due to such low-probability events can cause unnecessary performance overhead. Additionally, this solution remains ineffective in DB and Redis environments. Add dynamic transaction compensation time: The deadtime for long transactions (defined as the interval during which a transaction might encounter anomalies in the rollback or committing state and needs an asynchronous compensation task) is set to 2 minutes and 10 seconds by default. The globaltransactional annotation can be used to specify the deadtime for each transaction at a granular level to avoid concurrency. (Currently, there is a globally configurable server.retryDeadThreshold, but its granularity is insufficient.) However, the drawback of this solution is that, in DB storage mode, it requires adding new table columns, and users must update the server after modifying the table interface. Consensus algorithm (Raft + DB/Redis and other storage modes): This solution involves using Raft and storage modes like DB/Redis to detect whether the decision-making node is offline. The compensation task should only compensate for transactions in the "rollbacking" state on servers corresponding to an offline xid. If the server corresponding to the xid is online, no compensation is needed for that transaction, because if an exception occurs during synchronization, the transaction will change its status and will not remain in the "rollbacking" state. Therefore, if the server corresponding to the xid is alive, the "rollbacking" state can only be running on the live node, and no compensation is required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@seata.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: notifications-unsubscr...@seata.apache.org For additional commands, e-mail: notifications-h...@seata.apache.org