On Thu, May 27, 2010 at 08:04:18AM -0600, Jonathan Ellis wrote:
> This is a relic of when Gossip was over UDP and had to worry about
> packet size.  I created
> https://issues.apache.org/jira/browse/CASSANDRA-1138 to remove those
> notifications.

Ahh, okay, well its odd that a limit was set even with UDP.  I send large
UDP packets all the time with LWES and don't have many issues, but glad
to hear it will be fixed (I may patch locally a larger packet size as
a short term workaround).  Looking at the code it seems like if you hit
either of these notifications the message is not serialized (ie serialize
calls return false), would this explain why if I restart a machine in the
cluster in this state it only sees some of the ring?

In other words maybe with a fresh restart of everything, there is some
part of the serialized message which is small enough that all 27 machines
can be in there, however, once they've been running for a little bit they
start to creep over the limit, then suddenly gossiping starts to fail
as responses from some nodes are never sent, and I start seeing inconsistency
in the rings?

I think this hypothesis could be tested by just increasing the MAX size
so I think I will try that.

> I think the correlation with MessageDeserializer is a red herring.
> Gossip only happens once per second so I don't see how that could back
> MD up.

Yeah, I couldn't see either, just the 'Stopping deserialization' message
made me think it might (as only the nodes with a backed up MessageDeserializer
had that message).  Do gossip messages flow through the MessageDeserializer?

Thanks for the response,

-Anthony

> On Tue, May 25, 2010 at 5:33 PM, Anthony Molinaro
> <antho...@alumni.caltech.edu> wrote:
> > Hi,
> >
> >  I just noticed I have lots of these messages
> >
> > INFO [GMFD:1] 2010-05-25 23:21:04,070 GossipDigestSynMessage.java (line 152)
> >  Remaining bytes zero. Stopping deserialization in EndPointState.
> > INFO [GMFD:1] 2010-05-25 23:21:05,224 GossipDigestSynMessage.java (line 129)
> >  @@@@ Breaking out to respect the MTU size in EPS. Estimate is 56 @@@@
> >
> > The first message only occurs on some machines in my cluster.  The second
> > on all of them.
> >
> > The ones with the first message seem to be building up quite a backlog
> > in their MessageDeserializer PendingTasks.
> >
> > I assume there is a correlation, what could be causing this sort of thing?
> >
> > This cluster is now at 27 m1.xlarge boxes on ec2 running 0.6.2 of some 
> > flavor.
> >
> > Thanks,
> >
> > -Anthony
> >
> > --
> > ------------------------------------------------------------------------
> > Anthony Molinaro                           <antho...@alumni.caltech.edu>
> >
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <antho...@alumni.caltech.edu>

Reply via email to