Re: [Architecture] 2 Node Minimum HA Pattern for Stream Processor

Anoukh Jayawardena Wed, 25 Oct 2017 23:46:00 -0700

Sorry. I have attached the image with this

On Wed, Oct 25, 2017 at 10:33 AM, Sriskandarajah Suhothayan <[email protected]>
wrote:


> The image is missing.
>
> On Sun, Oct 22, 2017 at 3:48 PM, Anoukh Jayawardena <[email protected]>
> wrote:
>
>> Hi All
>>
>> This is an update on the WSO2 Stream Processor’s 2 Node minimum HA
>> deployment. While some of the design elements changed during the
>> implementation, this email will cover the overall design
>> Two SP nodes work in an active-passive configuration where both nodes
>> receive and process the same events. But only the active node will publish
>> the events. If the active node goes down, the passive node would change its
>> role and start publishing events as the active node. When the node that
>> went down is restarted, it would act as the passive node ready to change
>> roles when needed. For this to work it should be ensured that both nodes
>> receive the same events continuously.
>>
>>
>> 
>> To ensure "at least once publishing" following techniques were adopted
>>
>> *1. Double State Syncing of Passive Node*
>>
>> The base of the HA implementation is that both nodes have the same state
>> at a given time. For this, when the Passive node starts up it should sync
>> up with the Active node. This syncing is done in two user configurable
>> ways. i.e. Live Sync enabled or disabled.
>>
>> When live sync is enabled and a Siddhi application is deployed in the
>> passive node, a REST call is made to the Active node to get the snapshot of
>> the deployed Siddhi application. Once the snapshot is received and restored
>> on the passive node the Siddhi application will be deployed. If such a
>> snapshot is not found, the passive node would defer the deployment of the
>> Siddhi application for a user configurable time period after which another
>> state sync occurs. If yet a snapshot is not received, the Siddhi
>> application will be in an inactive state. When live sync is disabled, the
>> snapshot of the Siddhi application will be taken from the Active nodes last
>> persisted state which is either from the database or the file system.
>>
>> After the initial syncing of snapshots, the passive node’s sources may
>> not connect on time to process events. This means that passive node is not
>> 100% in sync with the active node. Hence a second syncing of states happen
>> after a user configured time period from the server start time. The
>> snapshot off all Siddhi applications is taken and restored in the passive
>> node. Since the size of the state may be large, an event queue is
>> implemented in the passive node as a solution for the time taken for the
>> snapshot to reach the passive node and restore. During the syncing, passive
>> node would queue all events and start processing events from where the
>> active node stopped processing to take the snapshot. This guarantees that
>> the active and passive node process the same amount of events.
>> In transferring snapshots using live sync, the snapshots are compressed
>> using gzip to reduce the size and time taken for the snapshot to reach the
>> passive node.
>>
>>
>> *2. Periodic output syncing of Passive Node*
>>
>> Although both active and passive nodes process the same events in a 100%
>> synced manner, once the active node goes down, the passive node may take
>> some time to identify this and start publishing events. This means some
>> events may not be published. As a solution, the passive node would queue
>> all the events that are processed (per event sink). Periodically passive
>> node would trim this queue according to the last processed event of the
>> active node. When passive node identifies that the active node is down, it
>> would start publishing the events in the queue first. This guarantees that
>> no events are dropped.
>>
>> Enabling live sync would allow the passive node to directly ping the
>> active node periodically to get the timestamp of the last published event
>> and use it to trim the queue. When live sync is disabled, the active node
>> would periodically save the same information in the database and the
>> passive node would periodically read this value.
>>
>> Find the user story of the implementation at [1] which has details about
>> the configuration of the 2 node minimum HA.
>>
>> [1] https://redmine.wso2.com/issues/6724
>>
>> Thanks
>> Anoukh
>>
>>
>> On Fri, Aug 18, 2017 at 2:01 PM, Anoukh Jayawardena <[email protected]>
>> wrote:
>>
>>> +architecture
>>>
>>>
>>> On Thu, Aug 17, 2017 at 3:24 PM, Anoukh Jayawardena <[email protected]>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> This is a high level overview of the 2 node minimum high availability
>>>> (HA) deployment feature for the Stream processor (SP). The implementation
>>>> would adopt an Active Passive approach with periodic state persistence. The
>>>> process flow of how this feature would work is as follows
>>>>
>>>> *Prerequisites*
>>>>
>>>>    - 2 SP workers, one would be the Active worker while the other
>>>>    would be the passive. Both nodes should include the same Siddhi
>>>>    Applications deployed.
>>>>    - A specified RDBMS or file location for periodic state persistence
>>>>    of Siddhi App states.
>>>>    - A running zookeeper service or RDBMS instance for coordination
>>>>    among the two nodes (Will be using carbon-coordination [1] for the 
>>>> purpose
>>>>    of distributed coordination).
>>>>    - Siddhi Client that would publish events to both Active and
>>>>    Passive nodes in a synced manner where message is published to passive 
>>>> node
>>>>    when the active node acknowledges message received.
>>>>
>>>>
>>>> [image: 2 Node HA Overview.jpg]
>>>>
>>>>
>>>> *Process*
>>>>
>>>>    - Both nodes will receive events and process them. But only the
>>>>    active node will publish the output events. This ensures that both nodes
>>>>    are in sync.
>>>>    - Active node will periodically persist the siddhi app states so
>>>>    that the state can be retrieved in a failover scenario.
>>>>    - A user defined “Live State Sync” option would determine how the
>>>>    states are synced between active and passive in a failover scenario
>>>>    (Explained below).
>>>>    - Following is how the implementation works in different system
>>>>    states
>>>>
>>>>
>>>>    1. When a new node is starting up when Active node is available
>>>>       - "Live State Sync Enabled" - The new node will detect the
>>>>       Active node is available, so it will register itself as the passive 
>>>> node
>>>>       and call the Active node and borrow the current state. When state 
>>>> borrowing
>>>>       happens the processing of events is paused in both nodes so that 
>>>> data is
>>>>       not lost. After that both nodes will process events in sync.
>>>>       - "Live State Sync Disabled" - The new node will detect the
>>>>       Active node is available, so it will register itself as the passive 
>>>> node
>>>>       and access the Database to get the last persisted state. This option 
>>>> may
>>>>       lead to few data loss since the Database will not contain a real time
>>>>       persisted state.
>>>>    2. When a new node is starting up when Active node is unavailable
>>>>       - The new node will detect the Active node is unavailable, so it
>>>>       will register itself as the Active node and access the Database to 
>>>> get the
>>>>       last persisted state. (For a fresh restart an API should be called 
>>>> before
>>>>       hand to clean the DB)
>>>>       3. When Active node goes down
>>>>       - The Passive node will detect that Active node is unavailable
>>>>       and would switch states and start publishing the output events.
>>>>
>>>>
>>>> *Data loss minimizing strategies*
>>>>
>>>>
>>>>    1. When active node goes down, the passive node does not know what
>>>>    the last event that was published by the active node. Therefore passive
>>>>    node might start publishing events from a later time. For example 
>>>> consider
>>>>    both nodes have processed 5 messages but only 2 messages have been
>>>>    published by Active node before failing. Passive node will start 
>>>> publishing
>>>>    from the 6th message onwards since it does not know what has not been
>>>>    published.
>>>>
>>>> As a solution a queue implemented in the passive node. The active node
>>>> will know the last event it published. Passive node would periodically ping
>>>> the active node and get the last published event and dequeue the buffer
>>>> accordingly, so that events are not lost, but might be duplicated. (This
>>>> might be a problem in live state sync)
>>>>
>>>>
>>>> [1] https://github.com/wso2/carbon-coordination
>>>>
>>>>
>>>> Thank You
>>>> --
>>>> *Anoukh Jayawardena*
>>>> *Software Engineer*
>>>>
>>>> *WSO2 Lanka (Private) Limited: http://wso2.com
>>>> <http://wso2.com/>lean.enterprise.middle-ware*
>>>>
>>>>
>>>> *phone: (+94) 77 99 28932*
>>>> <https://wso2.com/signature>
>>>>
>>>
>>>
>>>
>>> --
>>> *Anoukh Jayawardena*
>>> *Software Engineer*
>>>
>>> *WSO2 Lanka (Private) Limited: http://wso2.com
>>> <http://wso2.com/>lean.enterprise.middle-ware*
>>>
>>>
>>> *phone: (+94) 77 99 28932*
>>> <https://wso2.com/signature>
>>>
>>
>>
>>
>> --
>> *Anoukh Jayawardena*
>> *Software Engineer*
>>
>> *WSO2 Lanka (Private) Limited: http://wso2.com
>> <http://wso2.com/>lean.enterprise.middle-ware*
>>
>>
>> *phone: (+94) 77 99 28932*
>> <https://wso2.com/signature>
>>
>
>
>
> --
>
> *S. Suhothayan*
> Associate Director / Architect
> *WSO2 Inc. *http://wso2.com
> * <http://wso2.com/>*
> lean . enterprise . middleware
>
>
> *cell: (+94) 779 756 757 <+94%2077%20975%206757> | blog:
> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/>twitter:
> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in:
> http://lk.linkedin.com/in/suhothayan <http://lk.linkedin.com/in/suhothayan>*
>



-- 
*Anoukh Jayawardena*
*Software Engineer*

*WSO2 Lanka (Private) Limited: http://wso2.com
<http://wso2.com/>lean.enterprise.middle-ware*


*phone: (+94) 77 99 28932*
<https://wso2.com/signature>

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] 2 Node Minimum HA Pattern for Stream Processor

Reply via email to