Hi Willem, 2018-01-28 9:51 GMT+08:00 Willem Jiang <[email protected]>:
> Yeah, this could help us resolve the issue that different alpha server > check the same TxEvent. > But what if some of the alpha is offline, the timeout message cannot be > handled any more. > OK, we could restart the alpha with the same server name as I think it is might be the part of the recovery processing. Anyway, it would be an other immigrate process to update the server name to the available alpha ones if we can not restart the dead ones. Maybe we can create a leader from the alpha server to do job assign work, > and supervise the timeout event. > yeah, I think there was a similar leader election which using in the camel cluster [1] [1] https://github.com/nicolaferraro/spring-boot-camel-narayana-scalable/blob/master/src/main/java/com/example/CustomNarayanaRecoveryManagerBean.java > > > Willem Jiang > > Blog: http://willemjiang.blogspot.com (English) > http://jnn.iteye.com (Chinese) > Twitter: willemjiang > Weibo: 姜宁willem > > On Sat, Jan 27, 2018 at 11:53 PM, Zheng Feng <[email protected]> wrote: > > > The different alpha server I assumes that it has the different name, So > we > > can insert the name of the alpha server in the TxEvent record. > > When the alpha server is scanning the TxEvent records for the time out > > handling, it could only select these ones match the alpha name. > > It looks like that we don't need the lock here and it has to make sure > the > > alpha server name is unique. > > > > 2018-01-27 14:36 GMT+08:00 郑扬勇 <[email protected]>: > > > > > It seems all solution need import "lock"; > > > If one event can only handle by one alpha at the same time,we may need > > > election mechanism ? > > > > > > ------------------ 原始邮件 ------------------ > > > 发件人: "Eric Lee";<[email protected]>; > > > 发送时间: 2018年1月26日(星期五) 上午10:58 > > > 收件人: "dev"<[email protected]>; > > > > > > 主题: [Discussion] How to make sure events are handled only once > > > amongdifferent stateless Saga pack alphas > > > > > > > > > > > > Background > > > Currently, the transaction timeout is controlled by omega which makes > > omega > > > stateful. Being stateful makes omega recovery relies greatly on the > > > previous states. Hence, we need to move the timeout management from > omega > > > to alpha to simplify implementation of omega. After that, omega will > be a > > > stateless agent. > > > > > > Difficulty > > > How to make sure each timeout record are handled only once globally by > > > multiple alpha servers? Each alpha server is also stateless. All states > > are > > > stored in database. Alpha will scan the timeout events and handles them > > one > > > by one periodically. Different alpha may process the same event at the > > same > > > time which should be avoided because each event should be handled only > > > once. > > > > > > Possible Solutions: > > > 1. Add a expireTime column in TxEvent entity. Then lock the access to > the > > > timeout event to avoid concurrent access to the same event. Since > TxEvent > > > may involves many operations, adding the lock may introduce latency in > > > other transaction. > > > 2. Create a new entity like the Command entity. Then lock the access to > > > this entity and update the status asynchronously when it is done. > > > 3. Register timeout settings to alpha whenever omega starts. Then query > > > TxEvent and ServiceConfig table to find out timeout events. This way > > still > > > can not make sure each event is handled once as the range of the lock > is > > > too wide to target at a specific event. > > > > > > However, the above solutions still not perfect for the problem because > > the > > > lock will become invalid as soon as the query is done and another alpha > > may > > > query from database and process the same event before the timeout event > > > being processed by the previous alpha. > > > > > > Current implementation details can move forward to > > > https://github.com/apache/incubator-servicecomb-saga/pull/122 . > > > > > > Any suggestion is welcome. > > > > > > > > > Best Regards! > > > Eric Lee > > > > > >
