Hi Ruwan,
This is a very good initiative and I have few things to clarify.
On Sun, Oct 14, 2018 at 8:38 AM Ruwan Abeykoon <[email protected]> wrote:
> Hi Devs,
>
> *Why ${subject} **? *
> *I*mplement "Circuit Breaker" pattern in user store manager is becoming
> an essential part when it comes to multi tenant and multi-user store
> manager (USM) use case in IS. Here are the reasons.
>
> a) IS connects heterogeneous user stores implemented in LDAP/AD, JDBC,
> AWS, NoSQL, which has different timing characteristics.
> b) Each user store may be hosted in locations outside the data center
> which IS resides. The network delay, connection characteristics affects
> IO-Waits.
> c) Having single User-Store which causes few seconds of IO wait can starve
> all the HTTP processing thread pool (Tomcat pool) when there is average TPS
> (e.g. 100TPS) hits to offending user store.
>
> *How?*
> Hence I propose adding a layer around user store manager calls. What it
> does are,
> 1. Track delay in each call to user store manager.
> 2. Keep histogram of delay vs each call. History is kept for few minutes
> in memory.
> 3. Throttle down calls to any user store manager if there is considerable
> delay in a particular USM. Report the case in error log.
> 4. Throttle down is to throw a variant of IOException, so that the call
> (authentication, get claim, etc) will fail fast.
> 5. This will help not to starve tomcat thread pool un-necessarily on
> mis-behaving (slow) USM, so that the system is kept responsive.
>
According to Circuit Breaker pattern when circuit is tripped after timeout
period it comes to Half-Open state and check underline bottleneck still
exists. If issue is not there system should come to normal state. How this
is possible with proposed implementation.
>
> *Algorithm*
> 1. Calculation of histogram
> H = Number of request received(per USM)* IO Delay of each request(per
> USM)/ sum(Number of request received(per USM)* IO Delay of each request(per
> USM))
>
> 2. Activate throttling
> Throttle activation, if Threads blocked in USM > pre-defined factor *
> total tomcat threads
>
If possible having sample numbers would be easy to understand 1,2.
>
> 3. Throttling
> IOException for each request when,
> 3.1 - H > 0.1 (say) and IO Wait > 50ms (say)( both factors are
> configurable)
> 3.2 - Every request in ratio of H will be thrown IOException.
>
> With the above algorithm, the circuit breaker is kicked in when there is
> significant IO Delay and the threads seem to starve due to that. There will
> be no throttling when system behaves well, when no significant IO (network)
> delay.
>
> *Effort*
> Adding a layer to do the "circuit-breaker" is not something hard to do. We
> need to wrap all the calls to existing USM with "CircuitBreaker" (new
> class) which keeps track of calls and throw necessary IOException.
>
Where is this class going to locate user-core or some other identity
component.
*As a side note:* I have seen similar problem in two node cluster when
session data persistence is enabled, authentication operation takes longer
time due to heavy database operations (In case of DB not responding as
expected). From my experience throughput numbers goes down than single
node with session data persistence disabled. It would be great to have a
similar kind of solution for session persistence as well.
>
> Cheers,
> Ruwan
>
>
>
> --
>
> *Ruwan Abeykoon*
> *Associate Director/Architect**,*
> *WSO2, Inc. http://wso2.com <https://wso2.com/signature> *
> *lean.enterprise.middleware.*
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
--
Gayan
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture