Re: [Architecture] [IS] Circuit Breaker on user store LDAP+JDBC

Ruwan Abeykoon Thu, 18 Oct 2018 18:36:02 -0700

Hi Gayan,
Thanks for the suggestions.
>>According to Circuit Breaker pattern when circuit is tripped after
timeout period it comes to Half-Open state and check underline bottleneck
still exists. If issue is not there system should come to normal state. How
this is possible with proposed implementation.


Answer: We do not stop the calls. We just "throttle down", which means a
calculated percentage of calls still "attempted" to underlying USM. The
rest is returns as Exception. So it is much like half open.
Quote "3. Throttle down calls to any user store manager if there is
considerable delay in a particular USM. Report the case in error log."

>>Where is this class going to locate user-core or some other identity
component.
Answer: The most logical place is the User-Core.

>>If possible having sample numbers would be easy to understand 1,2.
Will come to this bit later.

Cheers,
Ruwan



On Mon, Oct 15, 2018 at 11:20 PM gayan gunawardana <[email protected]>
wrote:

>
> Hi Ruwan,
>
> This is a very good initiative and I have few things to clarify.
> On Sun, Oct 14, 2018 at 8:38 AM Ruwan Abeykoon <[email protected]> wrote:
>
>> Hi Devs,
>>
>> *Why ${subject} **? *
>> *I*mplement "Circuit Breaker" pattern in user store manager is becoming
>> an essential part when it comes to multi tenant and multi-user store
>> manager (USM) use case in IS. Here are the reasons.
>>
>> a) IS connects heterogeneous user stores implemented in LDAP/AD, JDBC,
>> AWS, NoSQL, which has different timing characteristics.
>> b) Each user store may be hosted in locations outside the data center
>> which IS resides. The network delay, connection characteristics affects
>> IO-Waits.
>> c) Having single User-Store which causes few seconds of IO wait can
>> starve all the HTTP processing thread pool (Tomcat pool) when there is
>> average TPS (e.g. 100TPS) hits to offending user store.
>>
>> *How?*
>> Hence I propose adding a layer around user store manager calls. What it
>> does are,
>> 1. Track delay in each call to user store manager.
>> 2. Keep histogram of delay vs each call. History is kept for few minutes
>> in memory.
>> 3. Throttle down calls to any user store manager if there is considerable
>> delay in a particular USM. Report the case in error log.
>> 4. Throttle down is to throw a variant of IOException, so that the call
>> (authentication, get claim, etc) will fail fast.
>> 5. This will help not to starve tomcat thread pool un-necessarily on
>> mis-behaving (slow) USM, so that the system is kept responsive.
>>
> According to Circuit Breaker pattern when circuit is tripped after timeout
> period it comes to Half-Open state and check underline bottleneck still
> exists. If issue is not there system should come to normal state. How this
> is possible with proposed implementation.
>
>>
>> *Algorithm*
>> 1. Calculation of histogram
>> H = Number of request received(per USM)* IO Delay of each request(per
>> USM)/ sum(Number of request received(per USM)* IO Delay of each request(per
>> USM))
>>
>> 2. Activate throttling
>> Throttle activation, if Threads blocked in USM > pre-defined factor *
>> total tomcat threads
>>
> If possible having sample numbers would be easy to understand 1,2.
>
>>
>> 3. Throttling
>> IOException for each request when,
>> 3.1 - H > 0.1 (say) and IO Wait > 50ms (say)( both factors are
>> configurable)
>> 3.2 - Every request in ratio of H will be thrown IOException.
>>
>> With the above algorithm, the circuit breaker is kicked in when there is
>> significant IO Delay and the threads seem to starve due to that. There will
>> be no throttling when system behaves well, when no significant IO (network)
>> delay.
>>
>
>> *Effort*
>> Adding a layer to do the "circuit-breaker" is not something hard to do.
>> We need to wrap all the calls to existing USM with "CircuitBreaker" (new
>> class) which keeps track of calls and throw necessary IOException.
>>
> Where is this class going to locate user-core or some other identity
> component.
> *As a side note:* I have seen similar problem in two node cluster when
> session data persistence is enabled, authentication operation takes longer
> time due to heavy database operations (In case of DB not responding as
> expected).  From my experience throughput numbers goes down than single
> node with session data persistence disabled. It would be great to have a
> similar kind of solution for session persistence as well.
>
>>
>> Cheers,
>> Ruwan
>>
>>
>>
>> --
>>
>> *Ruwan Abeykoon*
>> *Associate Director/Architect**,*
>> *WSO2, Inc. http://wso2.com <https://wso2.com/signature> *
>> *lean.enterprise.middleware.*
>>
>> _______________________________________________
>> Architecture mailing list
>> [email protected]
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>
>
> --
> Gayan
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] [IS] Circuit Breaker on user store LDAP+JDBC

Reply via email to