Re: [I] file&raft doGlobalRollback and doGlobalCommit may have concurrency issues with retry tasks [incubator-seata]

via GitHub Sun, 17 Nov 2024 22:05:54 -0800


funky-eyes commented on issue #7004:
URL: 
https://github.com/apache/incubator-seata/issues/7004#issuecomment-2482031263


   #7005 修复了并发出现的raft npe的问题，而二阶段的重试和决议可能会同时进行的问题还没处理
   原方案 1. 
增加本地锁，该方案在存算一体的raft和file下可解决，但是由于这种低概率事件而悲观的上锁，会导致不必要的性能损耗，并且在db和redis下依然无效
   方案2. 
增加动态的事务补偿时间，长事务的deadtime（认为事务在rollbaking和committing状态发生异常等需要异步任务进行补偿的时间间隔，默认2分10秒）可以通过globaltransactional注解进行指定每个事务粒度级别的deadtime，避免并发。（目前有全局可配置的server.retryDeadThreshold进行配置，但是粒度不够细），但是该方案缺点就是在db存储模式下需要增加表列，用户必须按照新的表接口变更后再升级server
   
方案3：共识算法，raft+db/redis等其它存储模式，来感知决议节点是否已下线，补偿任务仅补偿对应xid的server已经下线的rollbacking状态的事务，因为对应xid的server在线，不应该再进行补偿该事务，因为如果同步的过程中出现异常，事务会changestatus，并不会保持在rollbacking，也就是如果xid对应的server存活，rollbacking只可能在存活节点上正在运行，而不需要补偿。
   
   #7005 fixed the Raft NPE issue caused by concurrency, but the issue where 
two-phase retries and decisions might occur simultaneously has not been 
addressed yet.
   
   Original Plan:
   
   Add local locks: This solution resolves the issue in Raft with integrated 
storage and computation (such as Raft and file systems), but introducing 
pessimistic locking due to such low-probability events can cause unnecessary 
performance overhead. Additionally, this solution remains ineffective in DB and 
Redis environments.
   
   Add dynamic transaction compensation time: The deadtime for long 
transactions (defined as the interval during which a transaction might 
encounter anomalies in the rollback or committing state and needs an 
asynchronous compensation task) is set to 2 minutes and 10 seconds by default. 
The globaltransactional annotation can be used to specify the deadtime for each 
transaction at a granular level to avoid concurrency. (Currently, there is a 
globally configurable server.retryDeadThreshold, but its granularity is 
insufficient.) However, the drawback of this solution is that, in DB storage 
mode, it requires adding new table columns, and users must update the server 
after modifying the table interface.
   
   Consensus algorithm (Raft + DB/Redis and other storage modes): This solution 
involves using Raft and storage modes like DB/Redis to detect whether the 
decision-making node is offline. The compensation task should only compensate 
for transactions in the "rollbacking" state on servers corresponding to an 
offline xid. If the server corresponding to the xid is online, no compensation 
is needed for that transaction, because if an exception occurs during 
synchronization, the transaction will change its status and will not remain in 
the "rollbacking" state. Therefore, if the server corresponding to the xid is 
alive, the "rollbacking" state can only be running on the live node, and no 
compensation is required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@seata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@seata.apache.org
For additional commands, e-mail: notifications-h...@seata.apache.org

Re: [I] file&raft doGlobalRollback and doGlobalCommit may have concurrency issues with retry tasks [incubator-seata]

Reply via email to