Hi all!

It feels like I'm so close, yet so far.  After a couple of days of configuring 
and building I've been able to get annotated POJOs working with TreeCacheAOP 
and Java 1.5.  (Most of the time spent on my own bonehead mistakes.)

The cache works well locally but I can't seem to get it clustering.  I've got a 
listener on the mcast_addr so I see instances come up and announce that they're 
running with the same cache name.    I don't believe it's a jgroups or 
networking issue (although I've been wrong about 100 times in the last couple 
of days).

Here's what I'm doing:

1) Deploy app to Tomcat 5.5 on machine 1.  Access the app and see TreeCacheAOP 
working (at least locally).  Check the logs and verify that jgroups has bound 
to an address.  I see this, which tells me the cache is running:

0    [http-127.0.0.1-80-1] INFO  org.jboss.cache.PropertyConfigurator  - Found 
existing property editor for org.w3c.dom.Element: [EMAIL PROTECTED]
  | 31   [http-127.0.0.1-80-1] INFO  org.jboss.cache.PropertyConfigurator  - 
configure(): attribute size: 20
  | 47   [http-127.0.0.1-80-1] INFO  org.jboss.cache.TreeCache  - setting 
cluster properties from xml to: 
UDP(ip_mcast=true;ip_ttl=64;loopback=true;mcast_addr=228.1.2.3;mcast_port=41332;mcast_recv_buf_size=80000;mcast_send_buf_size=150000;ucast_recv_buf_size=80000;ucast_send_buf_size=150000):PING(down_thread=false;num_initial_members=3;timeout=2000;up_thread=false):MERGE2(max_interval=20000;min_interval=10000):FD_SOCK:VERIFY_SUSPECT(down_thread=false;timeout=1500;up_thread=false):pbcast.NAKACK(down_thread=false;gc_lag=50;max_xmit_size=8192;retransmit_timeout=600,1200,2400,4800;up_thread=false):UNICAST(down_thread=false;min_threshold=10;timeout=600,1200,2400;window_size=100):pbcast.STABLE(desired_avg_gossip=20000;down_thread=false;up_thread=false):FRAG(down_thread=false;frag_size=8192;up_thread=false):pbcast.GMS(join_retry_timeout=2000;join_timeout=5000;print_local_addr=true;shun=true):pbcast.STATE_TRANSFER(down_thread=true;up_thread=true)
  | 78   [http-127.0.0.1-80-1] INFO  org.jboss.cache.TreeCache  - 
setEvictionPolicyConfig(): [config: null]
  | 110  [http-127.0.0.1-80-1] INFO  org.jboss.cache.TreeCache  - interceptor 
chain is:
  | class org.jboss.cache.interceptors.CallInterceptor
  | class org.jboss.cache.interceptors.PessimisticLockInterceptor
  | class org.jboss.cache.interceptors.UnlockInterceptor
  | class org.jboss.cache.interceptors.ReplicationInterceptor
  | 141  [http-127.0.0.1-80-1] INFO  org.jboss.cache.TreeCache  - cache mode is 
REPL_SYNC
  | 1110 [http-127.0.0.1-80-1] INFO  org.jboss.cache.TreeCache  - 
USE_MARSHALLING is true. We will marshall/unmarshall the value.
  | 
  | -------------------------------------------------------
  | GMS: address is 192.168.1.101:2181
  | -------------------------------------------------------
  | 3203 [Thread-153] INFO  org.jboss.cache.TreeCache  - viewAccepted(): 
[192.168.1.101:2181|0] [192.168.1.101:2181]
  | 3219 [http-127.0.0.1-80-1] INFO  org.jboss.cache.TreeCache  - my local 
address is 192.168.1.101:2181
  | 3219 [http-127.0.0.1-80-1] INFO  org.jboss.cache.TreeCache  - state could 
not be retrieved (must be first member in group)
  | 3219 [http-127.0.0.1-80-1] INFO  org.jboss.cache.eviction.LRUPolicy  - 
Starting eviction policy using the provider: 
org.jboss.cache.eviction.AopLRUPolicy
  | 3219 [http-127.0.0.1-80-1] INFO  org.jboss.cache.eviction.LRUPolicy  - 
Starting a eviction timer with wake up interval of (secs) 5
  | 3219 [Thread-153] INFO  org.jboss.cache.TreeCache  - new cache is null 
(maybe first member in cluster)
  | 3219 [http-127.0.0.1-80-1] INFO  org.jboss.cache.TreeCache  - Cache is 
started!!

2) Check the mcast listener.  It sees machine 1's messages and the correct 
cache name.

3) Deploy app to Tomcat 5.5 on machine 2.  Access the app and see TreeCacheAOP 
in action.  Check the logs and verify that jgroups has bound to an address.  I 
see log messages similar to those from machine 1.  Unfortunately, machine 2 
doesn't connect with machine 1 and it creates its own cluster.  I see:

3219 [Thread-153] INFO  org.jboss.cache.TreeCache  - new cache is null (maybe 
first member in cluster)

4) Check the mcast listener.  I now see heartbeats from both machine 1 and 
machine 2.  They both report the same cache name "RegerCom-TreeCache-Cluster".  
To me this makes me think that they'll start replicating.

But in the logs of both I see things like:

79687 [UpHandler (FD_SOCK)] WARN  org.jgroups.protocols.pbcast.NAKACK  - 
192.168.1.101:2356] discarded message from non-member 192.168.1.103:1195

So I shut down all instances and bring them up, one at a time, about two 
minutes apart.  I've got three machines that I do this with.  They all share 
the same replSync-service.xml, which I've tweaked throughout the week:

<?xml version="1.0" encoding="UTF-8"?>
  | <server>
  |     <classpath codebase="./lib" archives="jboss-cache.jar, jgroups.jar"/>
  |     <mbean code="org.jboss.cache.aop.TreeCacheAop"
  |         name="jboss.cache:service=TreeCacheAop">
  |         <depends>jboss:service=Naming</depends>
  |         <depends>jboss:service=TransactionManager</depends>
  |         <attribute 
name="TransactionManagerLookupClass">org.jboss.cache.JBossTransactionManagerLookup</attribute>
  |         <attribute name="IsolationLevel">REPEATABLE_READ</attribute>
  |         <attribute name="CacheMode">REPL_SYNC</attribute>
  |         <attribute name="UseReplQueue">false</attribute>
  |         <attribute name="ReplQueueInterval">0</attribute>
  |         <attribute name="ReplQueueMaxElements">0</attribute>
  |         <attribute name="ClusterName">RegerCom-TreeCache-Cluster</attribute>
  |         <attribute name="ClusterConfig">
  |             <config>
  |                 <UDP mcast_addr="228.1.2.3" mcast_port="41332"
  |                     ip_ttl="64" ip_mcast="true" 
  |                     mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
  |                     ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
  |                     loopback="true"/>
  |                 <PING timeout="2000" num_initial_members="3" 
up_thread="false" down_thread="false"/>
  |                 <MERGE2 min_interval="10000" max_interval="20000"/>
  |                 <FD_SOCK/>
  |                 <VERIFY_SUSPECT timeout="1500" up_thread="false" 
down_thread="false"/>
  |                 <pbcast.NAKACK gc_lag="50" 
retransmit_timeout="600,1200,2400,4800" max_xmit_size="8192" up_thread="false" 
down_thread="false"/>
  |                 <UNICAST timeout="600,1200,2400" window_size="100" 
min_threshold="10" down_thread="false"/>
  |                 <pbcast.STABLE desired_avg_gossip="20000" up_thread="false" 
down_thread="false"/>
  |                 <FRAG frag_size="8192" down_thread="false" 
up_thread="false"/>
  |                 <pbcast.GMS join_timeout="10000" join_retry_timeout="2000" 
shun="true" print_local_addr="true"/>
  |                 <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
  |             </config>
  |         </attribute>
  |         <attribute name="FetchStateOnStartup">true</attribute>
  |         <attribute name="InitialStateRetrievalTimeout">15000</attribute>
  |         <attribute name="SyncReplTimeout">15000</attribute>
  |         <attribute name="LockAcquisitionTimeout">10000</attribute>
  |         <attribute 
name="EvictionPolicyClass">org.jboss.cache.eviction.AopLRUPolicy</attribute>
  |         <attribute name="EvictionPolicyConfig">
  |            <config>
  |               <attribute name="wakeUpIntervalSeconds">5</attribute>
  |               <region name="/_default_">
  |                   <attribute name="maxNodes">50000</attribute>
  |                   <attribute name="timeToLiveSeconds">0</attribute>
  |               </region>
  |               <region name="/usersession">
  |                   <attribute name="maxNodes">50000</attribute>
  |                   <attribute name="timeToLiveSeconds">0</attribute>
  |               </region>
  |            </config>
  |         </attribute>
  |         <attribute name="UseMarshalling">true</attribute>
  |     </mbean>
  | </server>
  | 

All machines share the same code + jars (deployed as a war file), same jvm 
version, same Tomcat 5.5.12 version, same network segment.

About one time in five, and seemingly randomly, I'll get some clustering to 
happen between two of the machines, but never between all three.  But it's not 
predictable and not solid.  I'm sure there's something misconfigured on my 
side, but I'm at a loss as to what it is.  

Does anything look out of whack?  Can you point me to common clustering issues? 
 I've checked out the jgroups doc for ideas.  Is it possible that I have two 
instances able to see each other via multicast, using the same cluster name but 
not clustering?  I thought that if I had the same replSync-service.xml on all 
three that they'd always cluster up.

Thanks for any help you can offer.  I'm looking forward to running 
TreeCacheAOP!  It's a great piece of work and exactly what many webapps need. 

Best,

Joe





View the original post : 
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3916116#3916116

Reply to the post : 
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=3916116


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
JBoss-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jboss-user

Reply via email to