Hi ZhangLei, Thx for the update, the test result looks very good, we get 10x performance improvement on the Alpha side.
Willem Jiang Twitter: willemjiang Weibo: 姜宁willem On Tue, Jul 9, 2019 at 10:10 PM Zhang Lei <[email protected]> wrote: > > Hi, all > > State machine-based Alpha improves performance by an order of magnitude. I > have pushed Alpha's benchmark report [1] and added the stress test tool > module alpha-benchmark [2]. > Next I will merge branch SCB-1321 to master branch. > > Any questions please tell me. > > [1] > https://github.com/apache/servicecomb-pack/blob/SCB-1321/alpha/alpha-fsm/Benchmark.md > > <https://github.com/apache/servicecomb-pack/blob/SCB-1321/alpha/alpha-fsm/Benchmark.md> > [2] > https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-benchmark > > <https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-benchmark> > > Lei Zhang > > > 在 2019年7月9日,上午8:50,Willem Jiang <[email protected]> 写道: > > > > Hi Zhanglei, > > > > I agree with you, in the timeout scenario, it's hard to tell if the > > transaction is finished successfully or not. > > I think we could provide an extension or plugin to let the user do > > some extra actions to do the compensation work after verifying the > > transaction states. > > > > BTW, as there are some changes happening in the master, maybe you can > > consider to merge the SCB-1321 branch to master branch. > > > > Willem Jiang > > > > Twitter: willemjiang > > Weibo: 姜宁willem > > > > On Tue, Jul 9, 2019 at 8:21 AM Zhang Lei <[email protected]> wrote: > >> > >> Hi All > >> > >> I have completed the acceptance test for the state machine and pushed to > >> branch SCB-1321 and CI pass. See more feature progress here[1]. > >> > >> In the acceptance test, the timeout is different from the previous one. > >> When the timeout occurs, the transaction will enter the suspended state > >> because we are not sure whether the sub-transaction is completed and when > >> it is completed. > >> > >> For example: > >> when booking timeout, we are not sure about the execution status of car or > >> hotel. If car or hotel sends TxEndedEvent after compensation, they will > >> not be compensated. > >> > >> Alpha > >> [x] State machine design document > >> [x] State machine prototype > >> [x] State machine prototype unit test > >> [x] Receive saga events using the internal message bus > >> [x] State machine integration test > >> [x] Enable state machine support via parameters > >> [x] Verify Akka persistent > >> [ ] Verify Akka cluster reliability > >> [ ] Save the terminated transaction data to the database > >> [ ] Support for in-process nested global transactions > >> [ ] Support for cross-process nested global transactions > >> [ ] Support for query terminated transaction data by RESTful API > >> [ ] Support for query running transaction data by RESTful API > >> [ ] Support for query running transaction data by RESTful API > >> [ ] Support for query suspended global transaction by RESTful API > >> [ ] Support for compensate failed sub-transaction by RESTful API > >> > >> Omega Components > >> [x] Enable state machine support via parameters > >> [x] State machine calls omega side compensation > >> [x] @SagaStart supports thread termination after the timeout > >> > >> Alpha & Omega > >> [x] Acceptance-pack-akka-spring-demo pass > >> [ ] Add sub-transaction timeout exception for akka acceptance test > >> [ ] Add compensation failure for akka acceptance test > >> [ ] Add compensation retry success for akka acceptance test > >> [ ] Alpha single node benchmark performance test > >> [ ] Alpha cluster benchmark performance test > >> > >> Tools > >> [ ] Alpha Benchmark tools > >> > >> Do Next: > >> 1. State machine metrics collection > >> 2. Alpha Benchmark tools > >> 3. Single alpha benchmark performance test > >> 4. Verify Akka cluster reliability > >> > >> [1] > >> https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm > >> <https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm> > >> > >> Lei Zhang > >> > >> > >>> 在 2019年6月28日,下午5:50,Zhang Lei <[email protected]> 写道: > >>> > >>> Hi, All > >>> > >>> alpha-fsm has been pushed to the branch SCB-1321 > >>> > >>> Completed: > >>> 1. State machine design document[1] > >>> 2. State machine prototype > >>> 3. State machine test case > >>> 4. Receive saga events using the internal message bus > >>> > >>> Key emphasis of next stage in work: > >>> In order to carry out the feasibility verification as soon as possible, I > >>> will not consider the reliability issue for the time being. > >>> 1. Refactor Omega components, add SagaAbortedEvent, SagaTimeoutEvent, > >>> TxComponsitedEvent > >>> 2. Save compensation method parameters in Actor and trigger compensation > >>> in Actor > >>> 3. Do not use Kafka and only verify single node alpha, The Alpha server > >>> receives the saga event and puts it into the internal message bus. > >>> > >>> Planning: > >>> 1. Persist actor data to the database when it terminates > >>> 2. Integration Kafka > >>> 3. Support WAL[2] recovery mode > >>> 4. Verify Akka cluster reliability > >>> > >>> [1] > >>> https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm > >>> <https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm> > >>> [2] https://en.wikipedia.org/wiki/Write-ahead_logging > >>> <https://en.wikipedia.org/wiki/Write-ahead_logging> > >>> > >>> if you have other comments, please let us know. > >>> > >>> Thanks, > >>> Lei Zhang > >>> > >>>> 在 2019年6月27日,上午9:50,Willem Jiang <[email protected]> 写道: > >>>> > >>>> We just leverage the message broker to make sure Alpha get the > >>>> transaction event from Omega. > >>>> In most cases Alpha don't need to talk back to Omega, we just need to > >>>> make sure all the transaction message are stored (Alpha can process it > >>>> later). > >>>> > >>>> If Omega cannot talk the message broker, Omega should abort the > >>>> transaction processing with transport exception. > >>>> > >>>> Willem Jiang > >>>> > >>>> Twitter: willemjiang > >>>> Weibo: 姜宁willem > >>>> > >>>> On Tue, Jun 25, 2019 at 8:42 AM Zhang Lei <[email protected]> wrote: > >>>>> > >>>>> Hi, Zhang jun > >>>>> > >>>>>> I just cared about the recovery scan thread design. > >>>>>> Kafka can ensure event message can be consumed by alpha exactly, but > >>>>>> recovery need know all the participated transaction response to decide > >>>>>> rollback or commit, so I think scan thread is also necessary. > >>>>> > >>>>> I am not sure, but I think Akka's persistence can solve this problem > >>>>> you care about. > >>>>> Of course, this ability needs to be verified > >>>>> > >>>>> Thanks, > >>>>> Zhang Lei > >>>>> > >>>>>> 在 2019年6月24日,上午10:46,赵俊 <[email protected]> 写道: > >>>>>> > >>>>>> Hi, Zhang Lei > >>>>>> > >>>>>>> A2 : I think we only need to ensure that the message can be reliably > >>>>>>> delivered to the state machine, The state machine is only a > >>>>>>> synchronous record state transition when the transaction is executed > >>>>>>> normally. At present, the compensation method based on table scan is > >>>>>>> also asynchronous. I am not sure if I have answered your question, or > >>>>>>> you can give me more information. > >>>>>> > >>>>>> If we have a mechanism that ensure main service can collect all the > >>>>>> participated transaction response from alpha correctly before > >>>>>> commit/rollback, it is OK. > >>>>>> > >>>>>>> Q2 : Also we should consider about recovery, it seems that recovery > >>>>>>> is as same as before based on database. > >>>>>>> A2 : I think the question you care about is how to recover when the > >>>>>>> alpha is down, this is a little different from the current version. > >>>>>>> 1. We can base on Kafka's reliability and control the offset of the > >>>>>>> topic, one message at a time > >>>>>>> 2. Of course, we can also do some extra design for it, such as > >>>>>>> logging the data log file locally after receiving the Kafka message. > >>>>>>> Resume the message by reading the data log file when the alpha > >>>>>>> machine restarts > >>>>>> > >>>>>> I just cared about the recovery scan thread design. > >>>>>> Kafka can ensure event message can be consumed by alpha exactly, but > >>>>>> recovery need know all the participated transaction response to decide > >>>>>> rollback or commit, so I think scan thread is also necessary. > >>>>>> > >>>>>> > >>>>>> > >>>>>>> On Jun 23, 2019, at 1:04 PM, Zhang Lei <[email protected]> wrote: > >>>>>>> > >>>>>>> Hi, Zhao Jun > >>>>>>> > >>>>>>> Thank you for your reply! > >>>>>>> > >>>>>>> This design document does not elaborate on reliability aspects. > >>>>>>> > >>>>>>> My initial thought is this > >>>>>>> > >>>>>>> Q1 : It seems that omega should hold on after consuming the event > >>>>>>> message from Kafka instead of completing pushing message > >>>>>>> A2 : I think we only need to ensure that the message can be reliably > >>>>>>> delivered to the state machine, The state machine is only a > >>>>>>> synchronous record state transition when the transaction is executed > >>>>>>> normally. At present, the compensation method based on table scan is > >>>>>>> also asynchronous. I am not sure if I have answered your question, or > >>>>>>> you can give me more information. > >>>>>>> > >>>>>>> Q2 : Also we should consider about recovery, it seems that recovery > >>>>>>> is as same as before based on database. > >>>>>>> A2 : I think the question you care about is how to recover when the > >>>>>>> alpha is down, this is a little different from the current version. > >>>>>>> 1. We can base on Kafka's reliability and control the offset of the > >>>>>>> topic, one message at a time > >>>>>>> 2. Of course, we can also do some extra design for it, such as > >>>>>>> logging the data log file locally after receiving the Kafka message. > >>>>>>> Resume the message by reading the data log file when the alpha > >>>>>>> machine restarts > >>>>>>> > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Lei Zhang > >>>>>>> > >>>>>>>> 在 2019年6月23日,上午7:08,zhaojun <[email protected]> 写道: > >>>>>>>> > >>>>>>>> I have some questions about the design. > >>>>>>>> 1. It seems that omega should hold on after consuming the event > >>>>>>>> message from Kafka instead of completing pushing message. > >>>>>>>> 2. Also we should consider about recovery, it seems that recovery is > >>>>>>>> as same as before based on database. > >>>>>>>> > >>>>>>>> ------------------ > >>>>>>>> Zhao Jun > >>>>>>>> Apache Sharding-Sphere & ServiceComb > >>>>>>>> > >>>>>>>>> On Jun 21, 2019, at 6:41 PM, Zhang Lei <[email protected]> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> I have created the alpha-fsm module on branch SCB-1321 and > >>>>>>>>> submitted the design documentation, state machine prototype and > >>>>>>>>> test cases. > >>>>>>>>> If there is any problem, please let me know. > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Lei Zhang > >>>>>>>>> > >>>>>>>>> [1] > >>>>>>>>> https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm > >>>>>>>>> > >>>>>>>>> <https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm> > >>>>>>>>> > >>>>>>>>>> 在 2019年6月20日,下午3:25,Zheng Feng <[email protected]> 写道: > >>>>>>>>>> > >>>>>>>>>> Yeah, I think Willem has create one [1] before and do you mind I > >>>>>>>>>> assign > >>>>>>>>>> this issue to you ? > >>>>>>>>>> > >>>>>>>>>> [1] https://issues.apache.org/jira/browse/SCB-1258 > >>>>>>>>>> > >>>>>>>>>> Zhang Lei <[email protected]> 于2019年6月20日周四 下午2:34写道: > >>>>>>>>>> > >>>>>>>>>>> Hi, Zheng Feng > >>>>>>>>>>> > >>>>>>>>>>> Thanks for your advice, I will create a JIRA first and start with > >>>>>>>>>>> the > >>>>>>>>>>> design documentation. > >>>>>>>>>>> > >>>>>>>>>>> Lei Zhang > >>>>>>>>>>> > >>>>>>>>>>>> 在 2019年6月19日,下午8:09,Zheng Feng <[email protected]> 写道: > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks a lot for sharing these information ! I think this state > >>>>>>>>>>>> machine > >>>>>>>>>>>> could be very experimental so it would helpful to create an > >>>>>>>>>>>> experimental > >>>>>>>>>>>> branch to add this module but not in the master branch. > >>>>>>>>>>>> > >>>>>>>>>>>> Zhang Lei <[email protected]> 于2019年6月19日周三 下午5:42写道: > >>>>>>>>>>>> > >>>>>>>>>>>>> I have completed some of the design and prototype in my github. > >>>>>>>>>>>>> > >>>>>>>>>>>>> In the design document [1] my original idea was that a > >>>>>>>>>>>>> transaction > >>>>>>>>>>>>> consisted of a SagaActor and several TxActors, and later > >>>>>>>>>>>>> TxAcotr was > >>>>>>>>>>>>> removed to reduce implementation complexity. > >>>>>>>>>>>>> I haven't had time to modify the documentation yet, but the > >>>>>>>>>>>>> SagaActor > >>>>>>>>>>>>> state machine [2] is up to date. > >>>>>>>>>>>>> Here you can see the test cases of SagaActor [3] > >>>>>>>>>>>>> > >>>>>>>>>>>>> [1] > >>>>>>>>>>>>> > >>>>>>>>>>> https://github.com/coolbeevip/playground/tree/master/state_machine_demo/saga-akkafsm > >>>>>>>>>>>>> < > >>>>>>>>>>>>> > >>>>>>>>>>> https://github.com/coolbeevip/playground/tree/master/state_machine_demo/saga-akkafsm > >>>>>>>>>>>>>> > >>>>>>>>>>>>> [2] > >>>>>>>>>>>>> > >>>>>>>>>>> https://github.com/coolbeevip/playground/blob/master/state_machine_demo/saga-akkafsm/assets/saga_state_diagram.png > >>>>>>>>>>>>> < > >>>>>>>>>>>>> > >>>>>>>>>>> https://github.com/coolbeevip/playground/blob/master/state_machine_demo/saga-akkafsm/assets/saga_state_diagram.png > >>>>>>>>>>>>>> > >>>>>>>>>>>>> [3] > >>>>>>>>>>>>> > >>>>>>>>>>> https://github.com/coolbeevip/playground/blob/master/state_machine_demo/saga-akkafsm/src/test/java/coolbeevip/playgroud/statemachine/saga/SagaActorTest.java > >>>>>>>>>>>>> < > >>>>>>>>>>>>> > >>>>>>>>>>> https://github.com/coolbeevip/playground/blob/master/state_machine_demo/saga-akkafsm/src/test/java/coolbeevip/playgroud/statemachine/saga/SagaActorTest.java > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Lei Zhang > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> 在 2019年6月19日,下午2:34,zhaojun <[email protected]> 写道: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> If we use AKKA, how can we design the actors, and how can we > >>>>>>>>>>>>>> guarantee > >>>>>>>>>>>>> omega will receive the message synchronize. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>> > >>>>> > >>> > >> >
