Re: About alpha-fsm progress

Zhang Lei Fri, 28 Jun 2019 09:01:01 -0700

Hi, Feng Zheng

Thank you for the reply :)


> 在 2019年6月28日，下午7:49，Zheng Feng <[email protected]> 写道：
> 
> Thanks Zhang Lei,
> 
> Zhang Lei <[email protected] <mailto:[email protected]>> 
> 于2019年6月28日周五 下午5:50写道：
> 
>> Hi, All
>> 
>> alpha-fsm has been pushed to the branch SCB-1321
>> 
>> Completed:
>> 1. State machine design document[1]
>> 2. State machine prototype
>> 3. State machine test case
>> 4. Receive saga events using the internal message bus
>> 
>> Key emphasis of next stage in work:
>> In order to carry out the feasibility verification as soon as possible, I
>> will not consider the reliability issue for the time being.
>> 1. Refactor Omega components, add SagaAbortedEvent, SagaTimeoutEvent,
>> TxComponsitedEvent
>> 2. Save compensation method parameters in Actor and trigger compensation
>> in Actor
>> 3. Do not use Kafka and only verify single node alpha, The Alpha server
>> receives the saga event and puts it into the internal message bus.
>> 
> It looks good to me !
> 
>> 
>> Planning:
>> 1. Persist actor data to the database when it terminates
>> 
> What are the actor data ? they are all the events ?

Data contains details of a global transaction and sub-transactions, It consists 
of three parts:
1. global transaction id, start time, end time, status.
2. sub-transaction id, start time, end time, status.
3. All events received by the actor

> 
>> 2. Integration Kafka
>> 
> So we can use the Kafka as a message broker and the invokings between the
> Omega and the Alpha will become async ?

This part needs to be discussed. My idea is not to change the communication 
protocol between omega and alpha. It only distributes to other alpha through 
Kafka after alpha receives the event so that alpha dynamic expansion can be 
realized, this architecture diagram has an intuitive description 
https://github.com/apache/servicecomb-pack/raw/SCB-1321/docs/fsm/assets/fsm.png 
<https://github.com/apache/servicecomb-pack/raw/SCB-1321/docs/fsm/assets/fsm.png>
If we use a message broker between omega and alpha, then we also need to 
consider the implementation of alpha call omega compensation.

> 
>> 3. Support WAL[2] recovery mode
>> 
> Well, it looks interesting and does it support by the akka ?
> 

No, nothing to do with Akka.
In order to avoid event loss caused by the system crash, Alpha first persists 
to disk after receiving an event, Then send the message to the Actor and record 
the message pointer. When Alpha restarts, it will first read the event pointer 
on the disk and resend the event to the actor from the pointer position.

>> 4. Verify Akka cluster reliability
>> 
>> [1]
>> https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm 
>> <https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm> <
>> https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm 
>> <https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm>>
>> [2] https://en.wikipedia.org/wiki/Write-ahead_logging 
>> <https://en.wikipedia.org/wiki/Write-ahead_logging> <
>> https://en.wikipedia.org/wiki/Write-ahead_logging 
>> <https://en.wikipedia.org/wiki/Write-ahead_logging>>
>> 
>> if you have other comments, please let us know.
>> 
> Good luck !
> 
>> 
>> Thanks,
>> Lei Zhang
>> 
>>> 在 2019年6月27日，上午9:50，Willem Jiang <[email protected] 
>>> <mailto:[email protected]>> 写道：
>>> 
>>> We just leverage the message broker to make sure Alpha get the
>>> transaction event from Omega.
>>> In most cases Alpha don't need to talk back  to Omega, we just need to
>>> make sure all the transaction message are stored (Alpha can process it
>>> later).
>>> 
>>> If Omega cannot talk the message broker, Omega should abort the
>>> transaction processing with transport exception.
>>> 
>>> Willem Jiang
>>> 
>>> Twitter: willemjiang
>>> Weibo: 姜宁willem
>>> 
>>> On Tue, Jun 25, 2019 at 8:42 AM Zhang Lei <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Hi, Zhang jun
>>>> 
>>>>> I just cared about the recovery scan thread design.
>>>>> Kafka can ensure event message can be consumed by alpha exactly, but
>> recovery need know all the participated transaction response to decide
>> rollback or commit, so I think scan thread is also necessary.
>>>> 
>>>> I am not sure, but I think Akka's persistence can solve this problem
>> you care about.
>>>> Of course, this ability needs to be verified
>>>> 
>>>> Thanks,
>>>> Zhang Lei
>>>> 
>>>>> 在 2019年6月24日，上午10:46，赵俊 <[email protected] <mailto:[email protected]>> 写道：
>>>>> 
>>>>> Hi, Zhang Lei
>>>>> 
>>>>>> A2 : I think we only need to ensure that the message can be reliably
>> delivered to the state machine, The state machine is only a synchronous
>> record state transition when the transaction is executed normally. At
>> present, the compensation method based on table scan is also asynchronous.
>> I am not sure if I have answered your question, or you can give me more
>> information.
>>>>> 
>>>>> If we have a mechanism that ensure main service can collect all the
>> participated transaction response from alpha correctly before
>> commit/rollback, it is OK.
>>>>> 
>>>>>> Q2 : Also we should consider about recovery, it seems that recovery
>> is as same as before based on database.
>>>>>> A2 : I think the question you care about is how to recover when the
>> alpha is down, this is a little different from the current version.
>>>>>> 1. We can base on Kafka's reliability and control the offset of the
>> topic, one message at a time
>>>>>> 2. Of course, we can also do some extra design for it, such as
>> logging the data log file locally after receiving the Kafka message. Resume
>> the message by reading the data log file when the alpha machine restarts
>>>>> 
>>>>> I just cared about the recovery scan thread design.
>>>>> Kafka can ensure event message can be consumed by alpha exactly, but
>> recovery need know all the participated transaction response to decide
>> rollback or commit, so I think scan thread is also necessary.
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jun 23, 2019, at 1:04 PM, Zhang Lei <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> 
>>>>>> Hi, Zhao Jun
>>>>>> 
>>>>>> Thank you for your reply！
>>>>>> 
>>>>>> This design document does not elaborate on reliability aspects.
>>>>>> 
>>>>>> My initial thought is this
>>>>>> 
>>>>>> Q1 : It seems that omega should hold on after consuming the event
>> message from Kafka instead of completing pushing message
>>>>>> A2 : I think we only need to ensure that the message can be reliably
>> delivered to the state machine, The state machine is only a synchronous
>> record state transition when the transaction is executed normally. At
>> present, the compensation method based on table scan is also asynchronous.
>> I am not sure if I have answered your question, or you can give me more
>> information.
>>>>>> 
>>>>>> Q2 : Also we should consider about recovery, it seems that recovery
>> is as same as before based on database.
>>>>>> A2 : I think the question you care about is how to recover when the
>> alpha is down, this is a little different from the current version.
>>>>>> 1. We can base on Kafka's reliability and control the offset of the
>> topic, one message at a time
>>>>>> 2. Of course, we can also do some extra design for it, such as
>> logging the data log file locally after receiving the Kafka message. Resume
>> the message by reading the data log file when the alpha machine restarts
>>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Lei Zhang
>>>>>> 
>>>>>>> 在 2019年6月23日，上午7:08，zhaojun <[email protected] 
>>>>>>> <mailto:[email protected]>> 写道：
>>>>>>> 
>>>>>>> I have some questions about the design.
>>>>>>> 1. It seems that omega should hold on after consuming the event
>> message from Kafka instead of completing pushing message.
>>>>>>> 2. Also we should consider about recovery, it seems that recovery is
>> as same as before based on database.
>>>>>>> 
>>>>>>> ------------------
>>>>>>> Zhao Jun
>>>>>>> Apache Sharding-Sphere & ServiceComb
>>>>>>> 
>>>>>>>> On Jun 21, 2019, at 6:41 PM, Zhang Lei <[email protected] 
>>>>>>>> <mailto:[email protected]>>
>> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I have created the alpha-fsm module on branch SCB-1321 and
>> submitted the design documentation, state machine prototype and test cases.
>>>>>>>> If there is any problem, please let me know.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Lei Zhang
>>>>>>>> 
>>>>>>>> [1]
>> https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm 
>> <https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm> <
>> https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm 
>> <https://github.com/apache/servicecomb-pack/tree/SCB-1321/alpha/alpha-fsm>>
>>>>>>>> 
>>>>>>>>> 在 2019年6月20日，下午3:25，Zheng Feng <[email protected] 
>>>>>>>>> <mailto:[email protected]>> 写道：
>>>>>>>>> 
>>>>>>>>> Yeah, I think Willem has create one [1] before and do you mind I
>> assign
>>>>>>>>> this issue to you ?
>>>>>>>>> 
>>>>>>>>> [1] https://issues.apache.org/jira/browse/SCB-1258 
>>>>>>>>> <https://issues.apache.org/jira/browse/SCB-1258>
>>>>>>>>> 
>>>>>>>>> Zhang Lei <[email protected] <mailto:[email protected]>> 
>>>>>>>>> 于2019年6月20日周四 下午2:34写道：
>>>>>>>>> 
>>>>>>>>>> Hi, Zheng Feng
>>>>>>>>>> 
>>>>>>>>>> Thanks for your advice, I will create a JIRA first and start with
>> the
>>>>>>>>>> design documentation.
>>>>>>>>>> 
>>>>>>>>>> Lei Zhang
>>>>>>>>>> 
>>>>>>>>>>> 在 2019年6月19日，下午8:09，Zheng Feng <[email protected] 
>>>>>>>>>>> <mailto:[email protected]>> 写道：
>>>>>>>>>>> 
>>>>>>>>>>> Thanks a lot for sharing these information ! I think this state
>> machine
>>>>>>>>>>> could be very experimental so it would helpful to create an
>> experimental
>>>>>>>>>>> branch to add this module but not in the master branch.
>>>>>>>>>>> 
>>>>>>>>>>> Zhang Lei <[email protected] <mailto:[email protected]>> 于2019年6月19日周三 
>>>>>>>>>>> 下午5:42写道：
>>>>>>>>>>> 
>>>>>>>>>>>> I have completed some of the design and prototype in my github.
>>>>>>>>>>>> 
>>>>>>>>>>>> In the design document [1]  my original idea was that a
>> transaction
>>>>>>>>>>>> consisted of a SagaActor and several TxActors, and later
>> TxAcotr was
>>>>>>>>>>>> removed to reduce implementation complexity.
>>>>>>>>>>>> I haven't had time to modify the documentation yet, but the
>> SagaActor
>>>>>>>>>>>> state machine [2] is up to date.
>>>>>>>>>>>> Here you can see the test cases of SagaActor [3]
>>>>>>>>>>>> 
>>>>>>>>>>>> [1]
>>>>>>>>>>>> 
>>>>>>>>>> 
>> https://github.com/coolbeevip/playground/tree/master/state_machine_demo/saga-akkafsm
>>  
>> <https://github.com/coolbeevip/playground/tree/master/state_machine_demo/saga-akkafsm>
>>>>>>>>>>>> <
>>>>>>>>>>>> 
>>>>>>>>>> 
>> https://github.com/coolbeevip/playground/tree/master/state_machine_demo/saga-akkafsm
>>  
>> <https://github.com/coolbeevip/playground/tree/master/state_machine_demo/saga-akkafsm>
>>>>>>>>>>>>> 
>>>>>>>>>>>> [2]
>>>>>>>>>>>> 
>>>>>>>>>> 
>> https://github.com/coolbeevip/playground/blob/master/state_machine_demo/saga-akkafsm/assets/saga_state_diagram.png
>>>>>>>>>>>> <
>>>>>>>>>>>> 
>>>>>>>>>> 
>> https://github.com/coolbeevip/playground/blob/master/state_machine_demo/saga-akkafsm/assets/saga_state_diagram.png
>>>>>>>>>>>>> 
>>>>>>>>>>>> [3]
>>>>>>>>>>>> 
>>>>>>>>>> 
>> https://github.com/coolbeevip/playground/blob/master/state_machine_demo/saga-akkafsm/src/test/java/coolbeevip/playgroud/statemachine/saga/SagaActorTest.java
>>>>>>>>>>>> <
>>>>>>>>>>>> 
>>>>>>>>>> 
>> https://github.com/coolbeevip/playground/blob/master/state_machine_demo/saga-akkafsm/src/test/java/coolbeevip/playgroud/statemachine/saga/SagaActorTest.java
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Lei Zhang
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> 在 2019年6月19日，下午2:34，zhaojun <[email protected]> 写道：
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If we use AKKA, how can we design the actors, and how can we
>> guarantee
>>>>>>>>>>>> omega will receive the message synchronize.

Re: About alpha-fsm progress

Reply via email to