On 07/31/2012 01:58 AM, Lars Ellenberg wrote:
> Besides that a ten node cluster is likely to break the 64k message size
> limit, even after compression...
The CIB is about 20K before compression...  So I think we're not in as
bad a shape as I would have guessed.
>
> You probably should re-organize the code so that you only have
> one receiving ucast socket per nic/ip/port.
That would be a big change or so it seems to me.  Right now, the parent
code doesn't look at the parameters given to its children...
>
> But I think that a single UDP packet will be delivered to
> a single socket, even though you have 18 receiving sockets
> bound to the same port (possible because of SO_REUSEPORT, only).
I was having various troubles with the system and wasn't sure debugging
was actually taking effect.  But your explanation may be the right one. 
I will get some more time on one of the systems in the next few days and
verify that.
> If we, as I think we do, receive on just one of them, where which one is
> determined by the kernel, not us, your suggested ingress filter on
> "expected" source IP would break communications.
Good point.
>
> Do you have evidence for the assumption that you receive incoming
> packets on all sockets, and not on just one of them?
I wasn't sure, actually - because of the troubles mentioned above.  I'll
check back in and let you know...

I saw the IPC (!) having troubles on one of the systems - and the CIB
was trying to send packets that were getting lost - and eventually the
CIB lost its connection to Heartbeat.  I could not imagine what could
cause that - so this was my theory.  We had a resource that we were
trying to restart but because of some disk problem it wouldn't actually
restart.   About this time on a different machine (the DC) we saw this
IPC issue.

If you have an idea what could cause IPC to behave this way I'd be happy
to know what it was...

    -- Alan Robertson
       al...@unix.sh
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to