Hi, On 10 December 2013 09:07, Dutch Meyer <[email protected]> wrote:
> I've been tracking an error we see occasionally on our cluster, we're > currently running behind trunk at build > 047b07a298d84e9755c6e06c035787ce397f4958. > > We've been seeing this error, it's quite rare and so far I've had no luck > reproducing it in a controlled setting. > > The symptom is that C clients see errors of the form: > > > ZOO_ERROR@handle_socket_error_msg@2726: Socket [10.11.13.2:2181] > > zk retcode=-2, errno=115(Operation now in progress): > unexpected server response: expected 0x529a8be8, but received 0x529a8be6 > > (note the expected/received entries are reversed here, we always receive a > larger entry than we were expecting). > > Kazoo clients are also failing similarly, with the error: > > zookeeper: xids do not match, expected %r received %r', 1435, 1436 > > Generally we see these failures in groups, where multiple clients will see > these failures from one server over a 5 or ten second windows. Sometimes > one client can fail with the error multiple times in that period. > > I'd appreciate any insight anyone can give me into why this is happening > and how we might fix it. Has anyone seen this before? Any hunches what > code or conditions I might investigate to reliably trigger or fix the > error? I'd just greatly appreciate any help in identifying the problem. > Are you using authentication? I wonder if your read/write ops are racing with your add_auth calls which would cause the out of order xids. -rgs
