Re: MemCacheTicketRegistry

Scott Battaglia Mon, 27 Oct 2008 18:51:07 -0700

On Mon, Oct 27, 2008 at 3:19 PM, Adam Rybicki <[EMAIL PROTECTED]> wrote:


>  Scott,
>
> Great fix!  I cannot make it fail again.
>

Any noticeable performance difference?

-Scott


>
> Thanks,
> Adam
>
> Scott Battaglia wrote:
>
> Let me know how the test goes.  Also if you find any bugs since again, it
> was written in about 30 seconds :-)
>
> -Scott
>
> -Scott Battaglia
> PGP Public Key Id: 0x383733AA
> LinkedIn: http://www.linkedin.com/in/scottbattaglia
>
>
> On Mon, Oct 27, 2008 at 1:26 PM, Adam Rybicki <[EMAIL PROTECTED]> wrote:
>
>> Scott,
>>
>> Great.  I will grab it and retest.  This will probably solve the issue.
>>
>> Adam
>>
>> Scott Battaglia wrote:
>>
>>  On Fri, Oct 24, 2008 at 7:58 PM, Adam Rybicki <[EMAIL PROTECTED]>wrote:
>>
>>> Scott,
>>>
>>> I mis-diagnosed the issue.  I just ran the same test, except I only ran
>>> one instance of memcached.  I am getting a high error rate on ticket
>>> validations.  So, it has nothing to do with memcached replication.  To
>>> investigate further, I disabled the second CAS server, and all errors are
>>> gone.  Of course that is not a viable workaround.  :-)
>>>
>>> My guess is that the error occurs when a ticket issued by one CAS server
>>> is being validated on another CAS server.  I could not find a way to enable
>>> debug logging in /cas/serviceValidate, but I think I have found a major
>>> clue.  It took most of the day today to hunt this down.
>>>
>>> With a single instance of memcached running in verbose mode you can see a
>>> sequence of messages like this:
>>> ------------------------------
>>> <11 add ST-8023-M0sU2U2ijyQ53QPYWnGm-arybicki1 1 300 2689
>>> >11 STORED
>>> <7 get ST-8023-M0sU2U2ijyQ53QPYWnGm-arybicki1
>>> >7 sending key ST-8023-M0sU2U2ijyQ53QPYWnGm-arybicki1
>>> >7 END
>>> <7 replace ST-8023-M0sU2U2ijyQ53QPYWnGm-arybicki1 1 300 2689
>>> >7 STORED
>>> <7 delete ST-8023-M0sU2U2ijyQ53QPYWnGm-arybicki1 0
>>> >7 DELETED
>>>  ------------------------------
>>> This is when everything went OK.  The sequence below, however, represents
>>> a service ticket that failed to validate.  That's apparently because an
>>> attempt to read the ticket was made before it was actually stored in cache!
>>> ------------------------------
>>> <11 add ST-8024-tKeeo5gYhjqoQzstAgqO-arybicki1 1 300 2689
>>> <7 get ST-8024-tKeeo5gYhjqoQzstAgqO-arybicki1
>>> >7 END
>>> >11 STORED
>>> ------------------------------
>>> There may be some code that synchronizes access to the same object from
>>> the same client.  However, it would seem that the service ticket is returned
>>> by CAS before it's actually stored in memcached.  If this service ticket is
>>> then presented to another instance of CAS for validation, it fails to
>>> retrieve it from memcached because the "add" operation has not completed.
>>>
>>> Again, I have to emphasize that this is not an unrealistic test.  The
>>> jMeter is simply following redirects at the time of the failure, as a
>>> browser would.
>>>
>>
>> We never saw that in production and we ran 500 virtual users.  However, if
>> you are experiencing it, you most likely could update the
>> MemcacheTicketRegistry to block on the Futures.  I've actually updated the
>> code in HEAD with an option to block on Futures. :-)
>>
>> I have not tried it at all, since I wrote it all of 30 seconds ago.  You
>> can grab it from HEAD and try it out.  The new property to enable it is
>> "synchronizeUpdatesToRegistry"
>>
>> Let me know if it helps/doesn't help.
>>
>> -Scott
>>
>>
>>>
>>> Adam
>>>
>>> Scott Battaglia wrote:
>>>
>>>  You have no need for sticky sessions.  If you have two repcached
>>> servers and you've told your CAS instance about both of them, the memcached
>>> client essentially sees them as two memcached servers (since its not
>>> familiar with repcached).
>>>
>>> The memcached client works in that it takes a hash of the key and that
>>> determines what instance of memcached/repcached to store the item on.
>>> repcached will then do its async replication.  When you come to validate a
>>> ticket the memcached client will again hash the key to determine what server
>>> the item is stored on.  If that server is unreachable (as determined by the
>>> memcached client) then it will try the next likely server that would hold
>>> the data.
>>>
>>> -Scott
>>>
>>> -Scott Battaglia
>>> PGP Public Key Id: 0x383733AA
>>> LinkedIn: http://www.linkedin.com/in/scottbattaglia
>>>
>>>
>>> On Fri, Oct 24, 2008 at 8:21 AM, Andrew Ralph Feller, afelle1 <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> So what you are saying is that even with replication enabled,
>>>> asynchronous replication CAS clusters should have sticky sessions on
>>>> regardless?  I realize that synchronous replication CAS cluster have no 
>>>> need
>>>> of sticky sessions seeing as how it goes to all servers before the user can
>>>> move on.
>>>>
>>>> Andrew
>>>>
>>>>
>>>> On 10/23/08 9:29 PM, "Scott Battaglia" <[EMAIL PROTECTED]>
>>>> wrote:
>>>>
>>>>    It actually shouldn't matter if the async works or not.  The
>>>> memcache clients are designed to hash to a particular server and only check
>>>> the backup servers if the primary isn't available.
>>>>
>>>> So you should always be validating against the original server unless
>>>> its no longer there.
>>>>
>>>> -Scott
>>>>
>>>> -Scott Battaglia
>>>> PGP Public Key Id: 0x383733AA
>>>> LinkedIn: http://www.linkedin.com/in/scottbattaglia
>>>>
>>>>
>>>> On Thu, Oct 23, 2008 at 9:17 PM, Adam Rybicki <[EMAIL PROTECTED]>
>>>> wrote:
>>>>
>>>>
>>>> Scott,
>>>>
>>>> I have run into a issue with MemCacheTicketRegistry and was wondering if
>>>> you have any thoughts.  I didn't want to create a new thread for this note.
>>>>  Anyone else with comments should feel free to reply, too. ;-)
>>>>
>>>> My tests have shown that when a ticket is generated on a CAS cluster
>>>> member it may sometimes fail to validate.  This is apparently because the
>>>> memcached asynchronous replication did not manage to send the ticket 
>>>> replica
>>>> in time.  Fast as repcached may be, under a relatively light load, ST
>>>> validation failed in 0.1% of the cases, or once in 1000 attempts.  It would
>>>> seem that the following tasks should be fairly complex:
>>>>
>>>>    - Browser accesses a CAS-protected service
>>>>    - Service redirects to CAS for authentication
>>>>    - CAS validates the TGT
>>>>    - CAS issues the ST for service
>>>>    - CAS redirects the browser to service
>>>>    - Service sends the ST for validation
>>>>
>>>> But they are fast!  My jMeter testing showed this taking 28 milliseconds
>>>> under light load on CAS server , which is amazingly fast.  Please note that
>>>> in real life, this can be just as fast because the browser, CAS, and 
>>>> service
>>>> perform these steps without the user slowing them down.  CAS is indeed a
>>>> lightweight system, and memcached does nothing to slow it down.  It seems
>>>> that in 0.1% of the cases this outperforms repcached under light load.  The
>>>> bad news is that under heavy load the failure rate increases.  I've seen as
>>>> bad as 8% failure rate.
>>>>
>>>> Have you or anyone else seen this?  Have you had to work around this?
>>>>
>>>> Thanks,
>>>> Adam
>>>>
>>>> Scott Battaglia wrote:
>>>>
>>>>
>>>> On Tue, Oct 14, 2008 at 11:15 AM, Andrew Ralph Feller, afelle1 <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>
>>>>
>>>>
>>>> Hey Scott,
>>>>
>>>> Thanks for answering some questions; really appreciate it.  Just a
>>>> handful more:
>>>>
>>>>
>>>>
>>>>    1. What happens whenever the server it intends to replicate with is
>>>>    down?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> It doesn't replicate :-) The client will send its request to the primary
>>>> server and if the primary server is down it will replicate to the 
>>>> secondary.
>>>>  The repcache server itself will not replicate to the other server if it
>>>> can't find it.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>    1.
>>>>     2.
>>>>     3. What happens whenever it comes back up?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> The repcache servers will sync with each other.  The memcache clients
>>>> will continue to function as they should
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>    1.
>>>>     2.
>>>>     3. Does the newly recovered machine synchronize itself with the
>>>>    other servers?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> The newly recovered machine with synchronize with its paired memcache
>>>> server.
>>>>
>>>> -Scott
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>    1.
>>>>     2.
>>>>
>>>>
>>>> Thanks,
>>>> Andrew
>>>>
>>>>
>>>> On 10/14/08 9:56 AM, "Scott Battaglia" <[EMAIL PROTECTED] <
>>>> http://[EMAIL PROTECTED]> > wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Memcache, as far as I know, uses a hash of the key to determine which
>>>> server to write to (and then with repcache, its replicated to its pair,
>>>> which you configure).
>>>>
>>>> -Scott
>>>>
>>>> -Scott Battaglia
>>>> PGP Public Key Id: 0x383733AA
>>>> LinkedIn: http://www.linkedin.com/in/scottbattaglia
>>>>
>>>>
>>>>  On Tue, Oct 14, 2008 at 10:38 AM, Andrew Ralph Feller, afelle1 <
>>>> [EMAIL PROTECTED] <http://[EMAIL PROTECTED]> > wrote:
>>>>
>>>>
>>>>
>>>>
>>>> Scott,
>>>>
>>>> I've looked at the sample configuration file on the JA-SIG wiki, however
>>>> I was curious how memcached handles cluster membership for lack of a better
>>>> word.  One of the things we are getting burned on by JBoss/Jgroups is the
>>>> frequency the cluster is being fragmented.
>>>>
>>>> Thanks,
>>>> Andrew
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  On 10/14/08 8:58 AM, "Scott Battaglia" <[EMAIL PROTECTED] <
>>>> http://[EMAIL PROTECTED]>  <http://[EMAIL PROTECTED]> >
>>>> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> We've disabled the registry cleaners since memcached has explicit time
>>>> outs (which are configurable on the registry).  We've configured it by
>>>> default with 1 gb of RAM I think, though I doubt we need that much.
>>>>
>>>> -Scott
>>>>
>>>> -Scott Battaglia
>>>> PGP Public Key Id: 0x383733AA
>>>> LinkedIn: http://www.linkedin.com/in/scottbattaglia
>>>>
>>>>
>>>>
>>>>
>>>>  On Mon, Oct 13, 2008 at 11:41 PM, Patrick Hennessy <[EMAIL PROTECTED]<
>>>> http://[EMAIL PROTECTED]>  <http://[EMAIL PROTECTED]> > wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> I've been working on updating from 3.2 to 3.3 and wanted to give
>>>> memcached a try instead of JBoss.  I read Scott's message about
>>>> performance and we've had good success here with memcached for other
>>>> applications.  It also looks like using memcached instead of JBoss will
>>>> simplify the configuration changes for the CAS server.
>>>>
>>>> I do have the JBoss replication working with CAS 3.2 but pounding the
>>>> heck out of it with JMeter will cause some not so nice stuff to happen.
>>>>   I'm using VMWare VI3 and configured an isolated switch for the
>>>> clustering and Linux-HA traffic.  I do see higher traffic levels coming
>>>> to my cluster in the future, but I'm not sure if they'll meet the levels
>>>> from my JMeter test. (I'm just throwing this out there because of the
>>>> recent Best practice thread.)
>>>>
>>>> If I use memcached, is the ticketRegistryCleaner not needed anymore?  I
>>>> left those beans in the ticketRegistry.xml file and saw all kinds of
>>>> errors.  After taking it out it seems to load fine and appears to work,
>>>> but I wasn't sure what the behavior is and I haven't tested it further.
>>>>   What if memcached fills up all the way?  Does anyone have a general
>>>> idea of how much memory to allocate to memcached with regards to
>>>> concurrent logins and tickets stored?
>>>>
>>>> Thanks,
>>>>
>>>> Pat
>>>> --
>>>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>>>>
>>>>  Patrick Hennessy                          ([EMAIL PROTECTED] <
>>>> http://[EMAIL PROTECTED]>  <http://[EMAIL PROTECTED]> )
>>>>
>>>>
>>>> Senior Systems Specialist
>>>> Division of Information and Educational Technology
>>>> Delaware Technical and Community College
>>>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>>>> _______________________________________________
>>>> Yale CAS mailing list
>>>>
>>>>   [email protected] <http://[email protected]>  <
>>>> http://[email protected]>
>>>>
>>>> http://tp.its.yale.edu/mailman/listinfo/cas
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> _______________________________________________
>>>> Yale CAS mailing list
>>>>  [email protected] <http://[email protected]>  <
>>>> http://[email protected]>
>>>>
>>>> http://tp.its.yale.edu/mailman/listinfo/cas
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  --
>>>> Andrew R. Feller, Analyst
>>>> Information Technology Services
>>>> 200 Fred Frey Building
>>>> Louisiana State University
>>>> Baton Rouge, LA 70803
>>>> (225) 578-3737 (Office)
>>>> (225) 578-6400 (Fax)
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Yale CAS mailing list
>>>>  [email protected]
>>>>  http://tp.its.yale.edu/mailman/listinfo/cas
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> _______________________________________________
>>>> Yale CAS mailing list
>>>> [email protected]
>>>> http://tp.its.yale.edu/mailman/listinfo/cas
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Yale CAS mailing list
>>>> [email protected]
>>>> http://tp.its.yale.edu/mailman/listinfo/cas
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> _______________________________________________
>>>> Yale CAS mailing list
>>>> [email protected]
>>>> http://tp.its.yale.edu/mailman/listinfo/cas
>>>>
>>>>
>>>> --
>>>> Andrew R. Feller, Analyst
>>>> Information Technology Services
>>>> 200 Fred Frey Building
>>>> Louisiana State University
>>>> Baton Rouge, LA 70803
>>>> (225) 578-3737 (Office)
>>>> (225) 578-6400 (Fax)
>>>>
>>>> _______________________________________________
>>>> Yale CAS mailing list
>>>> [email protected]
>>>> http://tp.its.yale.edu/mailman/listinfo/cas
>>>>
>>>>
>>>  ------------------------------
>>>
>>> _______________________________________________
>>> Yale CAS mailing [EMAIL PROTECTED]://tp.its.yale.edu/mailman/listinfo/cas
>>>
>>>
>>> _______________________________________________
>>> Yale CAS mailing list
>>> [email protected]
>>> http://tp.its.yale.edu/mailman/listinfo/cas
>>>
>>>
>>  ------------------------------
>>
>> _______________________________________________
>> Yale CAS mailing [EMAIL PROTECTED]://tp.its.yale.edu/mailman/listinfo/cas
>>
>>
>> _______________________________________________
>> Yale CAS mailing list
>> [email protected]
>> http://tp.its.yale.edu/mailman/listinfo/cas
>>
>>
> ------------------------------
>
> _______________________________________________
> Yale CAS mailing [EMAIL PROTECTED]://tp.its.yale.edu/mailman/listinfo/cas
>
>
> _______________________________________________
> Yale CAS mailing list
> [email protected]
> http://tp.its.yale.edu/mailman/listinfo/cas
>
>

_______________________________________________
Yale CAS mailing list
[email protected]
http://tp.its.yale.edu/mailman/listinfo/cas

Re: MemCacheTicketRegistry

Reply via email to