Re: About alpha-fsm progress

Zhang Lei Tue, 09 Jul 2019 07:10:46 -0700

Hi, all

State machine-based Alpha improves performance by an order of magnitude. I have 
pushed Alpha's benchmark report [1] and added the stress test tool module 
alpha-benchmark [2]. 
Next I will merge branch SCB-1321 to master branch.


Any questions please tell me.

[1] 
https://github.com/apache/servicecomb-pack/blob/SCB-1321/alpha/alpha-fsm/Benchmark.md
 
<https://github.com/apache/servicecomb-pack/blob/SCB-1321/alpha/alpha-fsm/Benchmark.md>
[2] 
https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-benchmark 
<https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-benchmark>

Lei Zhang

> 在 2019年7月9日，上午8:50，Willem Jiang <willem.ji...@gmail.com> 写道：
> 
> Hi Zhanglei,
> 
> I agree with you, in the timeout scenario, it's hard to tell if the
> transaction is finished successfully or not.
> I think we could provide an extension or plugin to let the user do
> some extra actions to do the compensation work after verifying the
> transaction states.
> 
> BTW,  as there are some changes happening in the master, maybe you can
> consider to merge the SCB-1321 branch to master branch.
> 
> Willem Jiang
> 
> Twitter: willemjiang
> Weibo: 姜宁willem
> 
> On Tue, Jul 9, 2019 at 8:21 AM Zhang Lei <zhang_...@boco.com.cn> wrote:
>> 
>> Hi  All
>> 
>> I have completed the acceptance test for the state machine and pushed to 
>> branch SCB-1321 and CI pass. See more feature progress here[1].
>> 
>> In the acceptance test, the timeout is different from the previous one. When 
>> the timeout occurs, the transaction will enter the suspended state because 
>> we are not sure whether the sub-transaction is completed and when it is 
>> completed.
>> 
>> For example:
>> when booking timeout, we are not sure about the execution status of car or 
>> hotel. If car or hotel sends TxEndedEvent after compensation, they will not 
>> be compensated.
>> 
>> Alpha
>>  [x]  State machine design document
>>  [x]  State machine prototype
>>  [x]  State machine prototype unit test
>>  [x]  Receive saga events using the internal message bus
>>  [x]  State machine integration test
>>  [x]  Enable state machine support via parameters
>>  [x]  Verify Akka persistent
>>  [ ]  Verify Akka cluster reliability
>>  [ ]  Save the terminated transaction data to the database
>>  [ ]  Support for in-process nested global transactions
>>  [ ]  Support for cross-process nested global transactions
>>  [ ]  Support for query terminated transaction data by RESTful API
>>  [ ]  Support for query running transaction data by RESTful API
>>  [ ]  Support for query running transaction data by RESTful API
>>  [ ]  Support for query suspended global transaction by RESTful API
>>  [ ]  Support for compensate failed sub-transaction by RESTful API
>> 
>> Omega Components
>>  [x]  Enable state machine support via parameters
>>  [x]  State machine calls omega side compensation
>>  [x]  @SagaStart supports thread termination after the timeout
>> 
>> Alpha & Omega
>>  [x]  Acceptance-pack-akka-spring-demo pass
>>  [ ]  Add sub-transaction timeout exception for akka acceptance test
>>  [ ]  Add compensation failure for akka acceptance test
>>  [ ]  Add compensation retry success for akka acceptance test
>>  [ ]  Alpha single node benchmark performance test
>>  [ ]  Alpha cluster benchmark performance test
>> 
>> Tools
>>  [ ]  Alpha Benchmark tools
>> 
>> Do Next:
>> 1. State machine metrics collection
>> 2. Alpha Benchmark tools
>> 3. Single alpha benchmark performance test
>> 4. Verify Akka cluster reliability
>> 
>> [1] https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm 
>> <https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm>
>> 
>> Lei Zhang
>> 
>> 
>>> 在 2019年6月28日，下午5:50，Zhang Lei <zhang_...@boco.com.cn> 写道：
>>> 
>>> Hi, All
>>> 
>>> alpha-fsm has been pushed to the branch SCB-1321
>>> 
>>> Completed:
>>> 1. State machine design document[1]
>>> 2. State machine prototype
>>> 3. State machine test case
>>> 4. Receive saga events using the internal message bus
>>> 
>>> Key emphasis of next stage in work:
>>> In order to carry out the feasibility verification as soon as possible, I 
>>> will not consider the reliability issue for the time being.
>>> 1. Refactor Omega components, add SagaAbortedEvent, SagaTimeoutEvent, 
>>> TxComponsitedEvent
>>> 2. Save compensation method parameters in Actor and trigger compensation in 
>>> Actor
>>> 3. Do not use Kafka and only verify single node alpha, The Alpha server 
>>> receives the saga event and puts it into the internal message bus.
>>> 
>>> Planning:
>>> 1. Persist actor data to the database when it terminates
>>> 2. Integration Kafka
>>> 3. Support WAL[2] recovery mode
>>> 4. Verify Akka cluster reliability
>>> 
>>> [1] 
>>> https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm 
>>> <https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm>
>>> [2] https://en.wikipedia.org/wiki/Write-ahead_logging 
>>> <https://en.wikipedia.org/wiki/Write-ahead_logging>
>>> 
>>> if you have other comments, please let us know.
>>> 
>>> Thanks,
>>> Lei Zhang
>>> 
>>>> 在 2019年6月27日，上午9:50，Willem Jiang <willem.ji...@gmail.com> 写道：
>>>> 
>>>> We just leverage the message broker to make sure Alpha get the
>>>> transaction event from Omega.
>>>> In most cases Alpha don't need to talk back  to Omega, we just need to
>>>> make sure all the transaction message are stored (Alpha can process it
>>>> later).
>>>> 
>>>> If Omega cannot talk the message broker, Omega should abort the
>>>> transaction processing with transport exception.
>>>> 
>>>> Willem Jiang
>>>> 
>>>> Twitter: willemjiang
>>>> Weibo: 姜宁willem
>>>> 
>>>> On Tue, Jun 25, 2019 at 8:42 AM Zhang Lei <zhang_...@boco.com.cn> wrote:
>>>>> 
>>>>> Hi, Zhang jun
>>>>> 
>>>>>> I just cared about the recovery scan thread design.
>>>>>> Kafka can ensure event message can be consumed by alpha exactly, but 
>>>>>> recovery need know all the participated transaction response to decide 
>>>>>> rollback or commit, so I think scan thread is also necessary.
>>>>> 
>>>>> I am not sure, but I think Akka's persistence can solve this problem you 
>>>>> care about.
>>>>> Of course, this ability needs to be verified
>>>>> 
>>>>> Thanks,
>>>>> Zhang Lei
>>>>> 
>>>>>> 在 2019年6月24日，上午10:46，赵俊 <zhaoju...@jd.com> 写道：
>>>>>> 
>>>>>> Hi, Zhang Lei
>>>>>> 
>>>>>>> A2 : I think we only need to ensure that the message can be reliably 
>>>>>>> delivered to the state machine, The state machine is only a synchronous 
>>>>>>> record state transition when the transaction is executed normally. At 
>>>>>>> present, the compensation method based on table scan is also 
>>>>>>> asynchronous. I am not sure if I have answered your question, or you 
>>>>>>> can give me more information.
>>>>>> 
>>>>>> If we have a mechanism that ensure main service can collect all the 
>>>>>> participated transaction response from alpha correctly before 
>>>>>> commit/rollback, it is OK.
>>>>>> 
>>>>>>> Q2 : Also we should consider about recovery, it seems that recovery is 
>>>>>>> as same as before based on database.
>>>>>>> A2 : I think the question you care about is how to recover when the 
>>>>>>> alpha is down, this is a little different from the current version.
>>>>>>> 1. We can base on Kafka's reliability and control the offset of the 
>>>>>>> topic, one message at a time
>>>>>>> 2. Of course, we can also do some extra design for it, such as logging 
>>>>>>> the data log file locally after receiving the Kafka message. Resume the 
>>>>>>> message by reading the data log file when the alpha machine restarts
>>>>>> 
>>>>>> I just cared about the recovery scan thread design.
>>>>>> Kafka can ensure event message can be consumed by alpha exactly, but 
>>>>>> recovery need know all the participated transaction response to decide 
>>>>>> rollback or commit, so I think scan thread is also necessary.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Jun 23, 2019, at 1:04 PM, Zhang Lei <zhang_...@boco.com.cn> wrote:
>>>>>>> 
>>>>>>> Hi, Zhao Jun
>>>>>>> 
>>>>>>> Thank you for your reply！
>>>>>>> 
>>>>>>> This design document does not elaborate on reliability aspects.
>>>>>>> 
>>>>>>> My initial thought is this
>>>>>>> 
>>>>>>> Q1 : It seems that omega should hold on after consuming the event 
>>>>>>> message from Kafka instead of completing pushing message
>>>>>>> A2 : I think we only need to ensure that the message can be reliably 
>>>>>>> delivered to the state machine, The state machine is only a synchronous 
>>>>>>> record state transition when the transaction is executed normally. At 
>>>>>>> present, the compensation method based on table scan is also 
>>>>>>> asynchronous. I am not sure if I have answered your question, or you 
>>>>>>> can give me more information.
>>>>>>> 
>>>>>>> Q2 : Also we should consider about recovery, it seems that recovery is 
>>>>>>> as same as before based on database.
>>>>>>> A2 : I think the question you care about is how to recover when the 
>>>>>>> alpha is down, this is a little different from the current version.
>>>>>>> 1. We can base on Kafka's reliability and control the offset of the 
>>>>>>> topic, one message at a time
>>>>>>> 2. Of course, we can also do some extra design for it, such as logging 
>>>>>>> the data log file locally after receiving the Kafka message. Resume the 
>>>>>>> message by reading the data log file when the alpha machine restarts
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Lei Zhang
>>>>>>> 
>>>>>>>> 在 2019年6月23日，上午7:08，zhaojun <zhaoju...@126.com> 写道：
>>>>>>>> 
>>>>>>>> I have some questions about the design.
>>>>>>>> 1. It seems that omega should hold on after consuming the event 
>>>>>>>> message from Kafka instead of completing pushing message.
>>>>>>>> 2. Also we should consider about recovery, it seems that recovery is 
>>>>>>>> as same as before based on database.
>>>>>>>> 
>>>>>>>> ------------------
>>>>>>>> Zhao Jun
>>>>>>>> Apache Sharding-Sphere & ServiceComb
>>>>>>>> 
>>>>>>>>> On Jun 21, 2019, at 6:41 PM, Zhang Lei <zhang_...@boco.com.cn> wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I have created the alpha-fsm module on branch SCB-1321 and submitted 
>>>>>>>>> the design documentation, state machine prototype and test cases.
>>>>>>>>> If there is any problem, please let me know.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Lei Zhang
>>>>>>>>> 
>>>>>>>>> [1] 
>>>>>>>>> https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm
>>>>>>>>>  
>>>>>>>>> <https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm>
>>>>>>>>> 
>>>>>>>>>> 在 2019年6月20日，下午3:25，Zheng Feng <zh.f...@gmail.com> 写道：
>>>>>>>>>> 
>>>>>>>>>> Yeah, I think Willem has create one [1] before and do you mind I 
>>>>>>>>>> assign
>>>>>>>>>> this issue to you ?
>>>>>>>>>> 
>>>>>>>>>> [1] https://issues.apache.org/jira/browse/SCB-1258
>>>>>>>>>> 
>>>>>>>>>> Zhang Lei <zhang_...@boco.com.cn> 于2019年6月20日周四 下午2:34写道：
>>>>>>>>>> 
>>>>>>>>>>> Hi, Zheng Feng
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for your advice, I will create a JIRA first and start with 
>>>>>>>>>>> the
>>>>>>>>>>> design documentation.
>>>>>>>>>>> 
>>>>>>>>>>> Lei Zhang
>>>>>>>>>>> 
>>>>>>>>>>>> 在 2019年6月19日，下午8:09，Zheng Feng <zh.f...@gmail.com> 写道：
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks a lot for sharing these information ! I think this state 
>>>>>>>>>>>> machine
>>>>>>>>>>>> could be very experimental so it would helpful to create an 
>>>>>>>>>>>> experimental
>>>>>>>>>>>> branch to add this module but not in the master branch.
>>>>>>>>>>>> 
>>>>>>>>>>>> Zhang Lei <cool...@qq.com> 于2019年6月19日周三 下午5:42写道：
>>>>>>>>>>>> 
>>>>>>>>>>>>> I have completed some of the design and prototype in my github.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In the design document [1]  my original idea was that a 
>>>>>>>>>>>>> transaction
>>>>>>>>>>>>> consisted of a SagaActor and several TxActors, and later TxAcotr 
>>>>>>>>>>>>> was
>>>>>>>>>>>>> removed to reduce implementation complexity.
>>>>>>>>>>>>> I haven't had time to modify the documentation yet, but the 
>>>>>>>>>>>>> SagaActor
>>>>>>>>>>>>> state machine [2] is up to date.
>>>>>>>>>>>>> Here you can see the test cases of SagaActor [3]
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> 
>>>>>>>>>>> https://github.com/coolbeevip/playground/tree/master/state_machine_demo/saga-akkafsm
>>>>>>>>>>>>> <
>>>>>>>>>>>>> 
>>>>>>>>>>> https://github.com/coolbeevip/playground/tree/master/state_machine_demo/saga-akkafsm
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> [2]
>>>>>>>>>>>>> 
>>>>>>>>>>> https://github.com/coolbeevip/playground/blob/master/state_machine_demo/saga-akkafsm/assets/saga_state_diagram.png
>>>>>>>>>>>>> <
>>>>>>>>>>>>> 
>>>>>>>>>>> https://github.com/coolbeevip/playground/blob/master/state_machine_demo/saga-akkafsm/assets/saga_state_diagram.png
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> [3]
>>>>>>>>>>>>> 
>>>>>>>>>>> https://github.com/coolbeevip/playground/blob/master/state_machine_demo/saga-akkafsm/src/test/java/coolbeevip/playgroud/statemachine/saga/SagaActorTest.java
>>>>>>>>>>>>> <
>>>>>>>>>>>>> 
>>>>>>>>>>> https://github.com/coolbeevip/playground/blob/master/state_machine_demo/saga-akkafsm/src/test/java/coolbeevip/playgroud/statemachine/saga/SagaActorTest.java
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Lei Zhang
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 在 2019年6月19日，下午2:34，zhaojun <zhaoju...@126.com> 写道：
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If we use AKKA, how can we design the actors, and how can we 
>>>>>>>>>>>>>> guarantee
>>>>>>>>>>>>> omega will receive the message synchronize.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>>

Re: About alpha-fsm progress

Reply via email to