On 07/31/2012 01:58 AM, Lars Ellenberg wrote: > Besides that a ten node cluster is likely to break the 64k message size > limit, even after compression... The CIB is about 20K before compression... So I think we're not in as bad a shape as I would have guessed. > > You probably should re-organize the code so that you only have > one receiving ucast socket per nic/ip/port. That would be a big change or so it seems to me. Right now, the parent code doesn't look at the parameters given to its children... > > But I think that a single UDP packet will be delivered to > a single socket, even though you have 18 receiving sockets > bound to the same port (possible because of SO_REUSEPORT, only). I was having various troubles with the system and wasn't sure debugging was actually taking effect. But your explanation may be the right one. I will get some more time on one of the systems in the next few days and verify that. > If we, as I think we do, receive on just one of them, where which one is > determined by the kernel, not us, your suggested ingress filter on > "expected" source IP would break communications. Good point. > > Do you have evidence for the assumption that you receive incoming > packets on all sockets, and not on just one of them? I wasn't sure, actually - because of the troubles mentioned above. I'll check back in and let you know...
I saw the IPC (!) having troubles on one of the systems - and the CIB was trying to send packets that were getting lost - and eventually the CIB lost its connection to Heartbeat. I could not imagine what could cause that - so this was my theory. We had a resource that we were trying to restart but because of some disk problem it wouldn't actually restart. About this time on a different machine (the DC) we saw this IPC issue. If you have an idea what could cause IPC to behave this way I'd be happy to know what it was... -- Alan Robertson al...@unix.sh _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/