Re: About alpha-fsm progress

Zhang Lei Mon, 08 Jul 2019 17:22:27 -0700

Hi  All

I have completed the acceptance test for the state machine and pushed to branch 
SCB-1321 and CI pass. See more feature progress here[1].


In the acceptance test, the timeout is different from the previous one. When 
the timeout occurs, the transaction will enter the suspended state because we 
are not sure whether the sub-transaction is completed and when it is completed.

For example: 
when booking timeout, we are not sure about the execution status of car or 
hotel. If car or hotel sends TxEndedEvent after compensation, they will not be 
compensated.

Alpha
  [x]  State machine design document
  [x]  State machine prototype
  [x]  State machine prototype unit test
  [x]  Receive saga events using the internal message bus
  [x]  State machine integration test
  [x]  Enable state machine support via parameters 
  [x]  Verify Akka persistent 
  [ ]  Verify Akka cluster reliability
  [ ]  Save the terminated transaction data to the database
  [ ]  Support for in-process nested global transactions
  [ ]  Support for cross-process nested global transactions
  [ ]  Support for query terminated transaction data by RESTful API
  [ ]  Support for query running transaction data by RESTful API
  [ ]  Support for query running transaction data by RESTful API
  [ ]  Support for query suspended global transaction by RESTful API
  [ ]  Support for compensate failed sub-transaction by RESTful API

Omega Components
  [x]  Enable state machine support via parameters
  [x]  State machine calls omega side compensation
  [x]  @SagaStart supports thread termination after the timeout

Alpha & Omega
  [x]  Acceptance-pack-akka-spring-demo pass
  [ ]  Add sub-transaction timeout exception for akka acceptance test
  [ ]  Add compensation failure for akka acceptance test
  [ ]  Add compensation retry success for akka acceptance test 
  [ ]  Alpha single node benchmark performance test
  [ ]  Alpha cluster benchmark performance test

Tools
  [ ]  Alpha Benchmark tools

Do Next:
1. State machine metrics collection
2. Alpha Benchmark tools
3. Single alpha benchmark performance test
4. Verify Akka cluster reliability

[1] https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm 
<https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm>

Lei Zhang


> 在 2019年6月28日，下午5:50，Zhang Lei <[email protected]> 写道：
> 
> Hi, All
> 
> alpha-fsm has been pushed to the branch SCB-1321
> 
> Completed:
> 1. State machine design document[1]
> 2. State machine prototype
> 3. State machine test case
> 4. Receive saga events using the internal message bus
> 
> Key emphasis of next stage in work:
> In order to carry out the feasibility verification as soon as possible, I 
> will not consider the reliability issue for the time being.
> 1. Refactor Omega components, add SagaAbortedEvent, SagaTimeoutEvent, 
> TxComponsitedEvent
> 2. Save compensation method parameters in Actor and trigger compensation in 
> Actor
> 3. Do not use Kafka and only verify single node alpha, The Alpha server 
> receives the saga event and puts it into the internal message bus.
> 
> Planning:
> 1. Persist actor data to the database when it terminates
> 2. Integration Kafka
> 3. Support WAL[2] recovery mode
> 4. Verify Akka cluster reliability
> 
> [1] https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm 
> <https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm>
> [2] https://en.wikipedia.org/wiki/Write-ahead_logging 
> <https://en.wikipedia.org/wiki/Write-ahead_logging>
> 
> if you have other comments, please let us know.
> 
> Thanks,
> Lei Zhang
> 
>> 在 2019年6月27日，上午9:50，Willem Jiang <[email protected]> 写道：
>> 
>> We just leverage the message broker to make sure Alpha get the
>> transaction event from Omega.
>> In most cases Alpha don't need to talk back  to Omega, we just need to
>> make sure all the transaction message are stored (Alpha can process it
>> later).
>> 
>> If Omega cannot talk the message broker, Omega should abort the
>> transaction processing with transport exception.
>> 
>> Willem Jiang
>> 
>> Twitter: willemjiang
>> Weibo: 姜宁willem
>> 
>> On Tue, Jun 25, 2019 at 8:42 AM Zhang Lei <[email protected]> wrote:
>>> 
>>> Hi, Zhang jun
>>> 
>>>> I just cared about the recovery scan thread design.
>>>> Kafka can ensure event message can be consumed by alpha exactly, but 
>>>> recovery need know all the participated transaction response to decide 
>>>> rollback or commit, so I think scan thread is also necessary.
>>> 
>>> I am not sure, but I think Akka's persistence can solve this problem you 
>>> care about.
>>> Of course, this ability needs to be verified
>>> 
>>> Thanks,
>>> Zhang Lei
>>> 
>>>> 在 2019年6月24日，上午10:46，赵俊 <[email protected]> 写道：
>>>> 
>>>> Hi, Zhang Lei
>>>> 
>>>>> A2 : I think we only need to ensure that the message can be reliably 
>>>>> delivered to the state machine, The state machine is only a synchronous 
>>>>> record state transition when the transaction is executed normally. At 
>>>>> present, the compensation method based on table scan is also 
>>>>> asynchronous. I am not sure if I have answered your question, or you can 
>>>>> give me more information.
>>>> 
>>>> If we have a mechanism that ensure main service can collect all the 
>>>> participated transaction response from alpha correctly before 
>>>> commit/rollback, it is OK.
>>>> 
>>>>> Q2 : Also we should consider about recovery, it seems that recovery is as 
>>>>> same as before based on database.
>>>>> A2 : I think the question you care about is how to recover when the alpha 
>>>>> is down, this is a little different from the current version.
>>>>> 1. We can base on Kafka's reliability and control the offset of the 
>>>>> topic, one message at a time
>>>>> 2. Of course, we can also do some extra design for it, such as logging 
>>>>> the data log file locally after receiving the Kafka message. Resume the 
>>>>> message by reading the data log file when the alpha machine restarts
>>>> 
>>>> I just cared about the recovery scan thread design.
>>>> Kafka can ensure event message can be consumed by alpha exactly, but 
>>>> recovery need know all the participated transaction response to decide 
>>>> rollback or commit, so I think scan thread is also necessary.
>>>> 
>>>> 
>>>> 
>>>>> On Jun 23, 2019, at 1:04 PM, Zhang Lei <[email protected]> wrote:
>>>>> 
>>>>> Hi, Zhao Jun
>>>>> 
>>>>> Thank you for your reply！
>>>>> 
>>>>> This design document does not elaborate on reliability aspects.
>>>>> 
>>>>> My initial thought is this
>>>>> 
>>>>> Q1 : It seems that omega should hold on after consuming the event message 
>>>>> from Kafka instead of completing pushing message
>>>>> A2 : I think we only need to ensure that the message can be reliably 
>>>>> delivered to the state machine, The state machine is only a synchronous 
>>>>> record state transition when the transaction is executed normally. At 
>>>>> present, the compensation method based on table scan is also 
>>>>> asynchronous. I am not sure if I have answered your question, or you can 
>>>>> give me more information.
>>>>> 
>>>>> Q2 : Also we should consider about recovery, it seems that recovery is as 
>>>>> same as before based on database.
>>>>> A2 : I think the question you care about is how to recover when the alpha 
>>>>> is down, this is a little different from the current version.
>>>>> 1. We can base on Kafka's reliability and control the offset of the 
>>>>> topic, one message at a time
>>>>> 2. Of course, we can also do some extra design for it, such as logging 
>>>>> the data log file locally after receiving the Kafka message. Resume the 
>>>>> message by reading the data log file when the alpha machine restarts
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Lei Zhang
>>>>> 
>>>>>> 在 2019年6月23日，上午7:08，zhaojun <[email protected]> 写道：
>>>>>> 
>>>>>> I have some questions about the design.
>>>>>> 1. It seems that omega should hold on after consuming the event message 
>>>>>> from Kafka instead of completing pushing message.
>>>>>> 2. Also we should consider about recovery, it seems that recovery is as 
>>>>>> same as before based on database.
>>>>>> 
>>>>>> ------------------
>>>>>> Zhao Jun
>>>>>> Apache Sharding-Sphere & ServiceComb
>>>>>> 
>>>>>>> On Jun 21, 2019, at 6:41 PM, Zhang Lei <[email protected]> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I have created the alpha-fsm module on branch SCB-1321 and submitted 
>>>>>>> the design documentation, state machine prototype and test cases.
>>>>>>> If there is any problem, please let me know.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Lei Zhang
>>>>>>> 
>>>>>>> [1] 
>>>>>>> https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm
>>>>>>>  
>>>>>>> <https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm>
>>>>>>> 
>>>>>>>> 在 2019年6月20日，下午3:25，Zheng Feng <[email protected]> 写道：
>>>>>>>> 
>>>>>>>> Yeah, I think Willem has create one [1] before and do you mind I assign
>>>>>>>> this issue to you ?
>>>>>>>> 
>>>>>>>> [1] https://issues.apache.org/jira/browse/SCB-1258
>>>>>>>> 
>>>>>>>> Zhang Lei <[email protected]> 于2019年6月20日周四 下午2:34写道：
>>>>>>>> 
>>>>>>>>> Hi, Zheng Feng
>>>>>>>>> 
>>>>>>>>> Thanks for your advice, I will create a JIRA first and start with the
>>>>>>>>> design documentation.
>>>>>>>>> 
>>>>>>>>> Lei Zhang
>>>>>>>>> 
>>>>>>>>>> 在 2019年6月19日，下午8:09，Zheng Feng <[email protected]> 写道：
>>>>>>>>>> 
>>>>>>>>>> Thanks a lot for sharing these information ! I think this state 
>>>>>>>>>> machine
>>>>>>>>>> could be very experimental so it would helpful to create an 
>>>>>>>>>> experimental
>>>>>>>>>> branch to add this module but not in the master branch.
>>>>>>>>>> 
>>>>>>>>>> Zhang Lei <[email protected]> 于2019年6月19日周三 下午5:42写道：
>>>>>>>>>> 
>>>>>>>>>>> I have completed some of the design and prototype in my github.
>>>>>>>>>>> 
>>>>>>>>>>> In the design document [1]  my original idea was that a transaction
>>>>>>>>>>> consisted of a SagaActor and several TxActors, and later TxAcotr was
>>>>>>>>>>> removed to reduce implementation complexity.
>>>>>>>>>>> I haven't had time to modify the documentation yet, but the 
>>>>>>>>>>> SagaActor
>>>>>>>>>>> state machine [2] is up to date.
>>>>>>>>>>> Here you can see the test cases of SagaActor [3]
>>>>>>>>>>> 
>>>>>>>>>>> [1]
>>>>>>>>>>> 
>>>>>>>>> https://github.com/coolbeevip/playground/tree/master/state_machine_demo/saga-akkafsm
>>>>>>>>>>> <
>>>>>>>>>>> 
>>>>>>>>> https://github.com/coolbeevip/playground/tree/master/state_machine_demo/saga-akkafsm
>>>>>>>>>>>> 
>>>>>>>>>>> [2]
>>>>>>>>>>> 
>>>>>>>>> https://github.com/coolbeevip/playground/blob/master/state_machine_demo/saga-akkafsm/assets/saga_state_diagram.png
>>>>>>>>>>> <
>>>>>>>>>>> 
>>>>>>>>> https://github.com/coolbeevip/playground/blob/master/state_machine_demo/saga-akkafsm/assets/saga_state_diagram.png
>>>>>>>>>>>> 
>>>>>>>>>>> [3]
>>>>>>>>>>> 
>>>>>>>>> https://github.com/coolbeevip/playground/blob/master/state_machine_demo/saga-akkafsm/src/test/java/coolbeevip/playgroud/statemachine/saga/SagaActorTest.java
>>>>>>>>>>> <
>>>>>>>>>>> 
>>>>>>>>> https://github.com/coolbeevip/playground/blob/master/state_machine_demo/saga-akkafsm/src/test/java/coolbeevip/playgroud/statemachine/saga/SagaActorTest.java
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Lei Zhang
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 在 2019年6月19日，下午2:34，zhaojun <[email protected]> 写道：
>>>>>>>>>>>> 
>>>>>>>>>>>> If we use AKKA, how can we design the actors, and how can we 
>>>>>>>>>>>> guarantee
>>>>>>>>>>> omega will receive the message synchronize.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> 
>

Re: About alpha-fsm progress

Reply via email to