OK, Marvin. Thanks for this insight. We are currently running a three node cluster with ehcache configured to replicate to the other two nodes in the cluster. Going memcache/repcache would put us in a 2 node cluster? This is mostly sufficient in your opinion or should we be adding an additional server and replicating between 2 two-node repcache instances, or not use repcache at all and just configure the bean to hit both or three servers?
I have adjusted the heap in jetty (ehCache in production) to 2G now and we have been watching the memory usage over the last couple days. I did notice a bit of a spike which I attached an gif of. The system seemed to recover eventually but starting at 3:30 until 5PM the heap was in the area I have defined as a warning threshold. I am not sure what is going on in there at that time, so I am looking for thoughts here also. I intend to get a chart of the login rate over this period to see if there is any correlation to activity or if this is something "random" floating around. Your collective thoughts are always appreciated. -Michael. -----Original Message----- From: Marvin Addison [mailto:[email protected]] Sent: Saturday, June 29, 2013 6:25 AM To: [email protected] Subject: Re: [cas-user] ehCache, heap, timeouts? > What I would like is for the replication to use either async or a > timeout value where if the other peer nodes are under duress, the > sending threads don't block; rather they timeout quickly, or send the > tickets a bit more asynchronously, without the client on the hook. Asynchronous replication has the potential to cause application errors. For example a ticket issued from node1 and validated at node2, which _will_ happen occasionally even with LB session affinity since the requests are sourced differently, may fail to validate if replication is slower than the ticket validation request. For that reason I removed the async write option in the memcached ticket registry under the premise every registry should guarantee correct behavior irrespective of environment. You may be able to shoot yourself in the foot with Ehcache; proceed carefully. > I did find a parameter socketTimeoutMillis, to be placed in the > cacheManagerPeerListenerFactory stanza, like this: > > <cacheManagerPeerListenerFactory > class="net.sf.ehcache.distribution.RMICacheManagerPeerListenerFactory" > properties="port=9520, socketTimeoutMillis=2000" /> I would imagine that this is only applies to establishing connections, which presumably happens infrequently; for example, when initializing the cache or recovering a connection to a lost peer. From your description, it sounds like you want to apply a timeout to data transfer, which would happen over an established connection. I would recommend you head over to the Ehcache forums to confirm the behavior and to seek further advice. > I also do have interest in moving the ticket registry (yet again) to > memcached, considering that it would then put the registry outside the > JVM heap, so perhaps the storage/replication would be more reliable > for the three node cluster we are using. Anyone move from ehcache to > memcached? I encourage you to consider it. We moved from JPA/database to memcached and haven't looked back. Our memcached nodes were initially configured for 1G memory, but we never came anywhere close. IIRC our memory usage is more on the order of 128M or less. The failure mode of memcached nodes is very graceful and suitable for our HA requirements. I can provide more info if interested. It seems relevant to note that some of the largest known CAS deployments run memcached. M -- You are currently subscribed to [email protected] as: [email protected] To unsubscribe, change settings or access archives, see http://www.ja-sig.org/wiki/display/JSG/cas-user -- You are currently subscribed to [email protected] as: [email protected] To unsubscribe, change settings or access archives, see http://www.ja-sig.org/wiki/display/JSG/cas-user
<<attachment: EmChartBean.gif>>
