Hi,

On 10 December 2013 09:07, Dutch Meyer <[email protected]> wrote:

> I've been tracking an error we see occasionally on our cluster, we're
> currently running behind trunk at build
> 047b07a298d84e9755c6e06c035787ce397f4958.
>
> We've been seeing this error, it's quite rare and so far I've had no luck
> reproducing it in a controlled setting.
>
> The symptom is that C clients see errors of the form:
>
>
>   ZOO_ERROR@handle_socket_error_msg@2726: Socket [10.11.13.2:2181]
>
>   zk retcode=-2, errno=115(Operation now in progress):
>   unexpected server response: expected 0x529a8be8, but received 0x529a8be6
>
> (note the expected/received entries are reversed here, we always receive a
> larger entry than we were expecting).
>
> Kazoo clients are also failing similarly, with the error:
>
>   zookeeper: xids do not match, expected %r received %r', 1435, 1436
>
> Generally we see these failures in groups, where multiple clients will see
> these failures from one server over a 5 or ten second windows.  Sometimes
> one client can fail with the error multiple times in that period.
>
> I'd appreciate any insight anyone can give me into why this is happening
> and how we might fix it.  Has anyone seen this before?  Any hunches what
> code or conditions I might investigate to reliably trigger or fix the
> error?  I'd just greatly appreciate any help in identifying the problem.
>

Are you using authentication? I wonder if your read/write ops are racing
with your add_auth calls
which would cause the out of order xids.


-rgs

Reply via email to