I've been tracking an error we see occasionally on our cluster, we're
currently running behind trunk at build
047b07a298d84e9755c6e06c035787ce397f4958.

We've been seeing this error, it's quite rare and so far I've had no luck
reproducing it in a controlled setting.

The symptom is that C clients see errors of the form:


  ZOO_ERROR@handle_socket_error_msg@2726: Socket [10.11.13.2:2181]

  zk retcode=-2, errno=115(Operation now in progress):
  unexpected server response: expected 0x529a8be8, but received 0x529a8be6

(note the expected/received entries are reversed here, we always receive a
larger entry than we were expecting).

Kazoo clients are also failing similarly, with the error:

  zookeeper: xids do not match, expected %r received %r', 1435, 1436

Generally we see these failures in groups, where multiple clients will see
these failures from one server over a 5 or ten second windows.  Sometimes
one client can fail with the error multiple times in that period.

I'd appreciate any insight anyone can give me into why this is happening
and how we might fix it.  Has anyone seen this before?  Any hunches what
code or conditions I might investigate to reliably trigger or fix the
error?  I'd just greatly appreciate any help in identifying the problem.

-- 
-=-Dutch

Reply via email to