OK, I think I get it. The rough sanity check is applied only when
deserializing, the len of incoming buffer is read. There is no check
for outgoing data when serializing. And there are 10s of bytes in the
serialization metadata, so if a client is writing just below 1 MB
(1024 * 1024 - 1 bytes), the final incoming buffer data would exceed 1
MB and the write would fail. So it's kind of inaccurate that by
default zk could store a znode just below 1 MB (1024 * 1024 - 1). To
make it accurate, maybe we could just check the bytes length before
serializing, and the server could add some extra bytes based on 1 MB.
I guess this is minor as it is just a rough sanity check. ZK just does
not expect a client would write that large data :)

On Fri, Jan 8, 2021 at 12:49 AM Huizhi Lu <ihuizhi...@gmail.com> wrote:
>
> From what I've learned and also the doc:
>
> "jute.maxbuffer : (Java system property:jute.maxbuffer).
>
> When jute.maxbuffer in the client side is greater than the server
> side, the client wants to write the data exceeds jute.maxbuffer in the
> server side, the server side will get java.io.IOException: Len error
> When jute.maxbuffer in the client side is less than the server side,
> the client wants to read the data exceeds jute.maxbuffer in the client
> side, the client side will get java.io.IOException: Unreasonable
> length or Packet len is out of range!"
>
> So I assume: the client only honors jute.maxbuffer when reading. If
> the client tries to read the data > jute.maxbuffer, it fails. For
> writing, jute.maxbuffer is honored on server side, the client does not
> do the sanity check.
> Correct me if I am wrong. I would really expect the client can also
> fail the request if it's writing data > jute.maxbuffer.
>
> On Fri, Jan 8, 2021 at 12:40 AM Huizhi Lu <ihuizhi...@gmail.com> wrote:
> >
> > Hi Ted,
> >
> > Really appreciate your prompt response and detailed explanation!
> >
> > For some reason, ZK could be abused for writing large data objects.
> > I understand we should correctly use ZK for coordination that ZK is best at.
> > It's definitely something we could improve how we use ZK. But maybe
> > it'd be a long run to arrive.
> > Thanks for the clarification :)
> >
> > Back to the jute maxbuffer setting. With the consistent values 1 MB on
> > both client and server,
> > I am still able to produce it: request is sent to server as it throws
> > IOException "Len error" and closes
> > the connection. The client log is below, which does not give
> > descriptive enough info like "Len error".
> > [main-SendThread(localhost:2181)] WARN
> > org.apache.zookeeper.ClientCnxn - Session 0x1003e7613ab0005 for sever
> > localhost:2181, Closing socket connection. Attempting reconnect except
> > it is a SessionExpiredException.
> > java.io.IOException: Connection reset by peer
> >
> > With this, can I assume the zk client does not fail the request?
> > I also dig into the code, it seems the request reaches the server and
> > the server fails the request.
> > I am actually expecting the request can be failed earlier on the
> > client side and then get descriptive info "the packet size is too
> > large".
> > Is this (when writing, client jute.maxbuffer is not honored) expected?
> > I think if the client side fails the request and gives more
> > descriptive info/specific exception, that'd be great and it's what I
> > would expect.
> >
> > -Huizhi
> >
> > On Fri, Jan 8, 2021 at 12:01 AM Ted Dunning <ted.dunn...@gmail.com> wrote:
> > >
> > > Let's be clear from the start, storing large data objects in Zookeeper is
> > > strongly discouraged. If you want to store large objects with good
> > > consistency models, store the data in something else (like a distributed
> > > file system or key value store), commit the data and then use ZK to 
> > > provide
> > > a reference to that data. Zookeeper is intended for coordination, not data
> > > storage. It is not a reasonable alternative to a noSQL database.
> > >
> > > That said, good and informative error messages are always useful and are
> > > better than anonymous errors. Even if the connection is closed related to
> > > the error, it would be nice to give some decent feedback.
> > >
> > > On the other hand, the error that you are seeing sounds like your client
> > > and your server have inconsistent settings for the maximum jute buffer
> > > size. If the client has a larger setting than the server, then the server
> > > will run out of buffer space before reading the entire request. To the
> > > server, this will look like a network error and there is little that the
> > > server can do to recover the connection safely because some bytes may
> > > already have been lost due to the short read. As such closing the
> > > connection is pretty much all that can be done.
> > >
> > > If buffer lengths on client and server match and the client tries a long
> > > write, I believe the write will fail on the client side with a much more
> > > descriptive message.
> > >
> > > One thing that could plausibly be done would be to enhance the initial
> > > handshake between client and server so mismatch in buffer sizes are
> > > detected more aggressively. Since a length could be exchanged in a fixed
> > > size, this could be done while keeping the connection healthy which would,
> > > in turn, allow a useful error message that drives to the true source of 
> > > the
> > > error (i.e. the configuration). This would, however, require a protocol
> > > change which is always a sensitive change.
> > >
> > > Can you determine if the root cause of your problem is inconsistent
> > > settings between client and server?
> > >
> > >
> > >
> > > On Thu, Jan 7, 2021 at 10:27 PM Huizhi Lu <h...@apache.org> wrote:
> > >
> > > > Hi ZK Experts,
> > > >
> > > > I would like to ask a quick question. As we know, assume we are using
> > > > the default 1 MB jute.maxbuffer, if a zk client tries to write a large
> > > > znode > 1MB, the server will fail it. Server will log "Len error" and
> > > > close the connection. The client will receive a connection loss. In a
> > > > third party ZkClient lib (eg. I0IZkClient), it'll keep retrying the
> > > > operation upon connection loss. And this forever retrying might have a
> > > > chance to take down the zk server.
> > > >
> > > > I believe the zk community must have considered such a situation. I
> > > > wonder why zk server does not handle the error a bit better and send a
> > > > clearer response to the client, eg. KeeperException.PacketLenError
> > > > (and zk server does not really have to close the connection), so the
> > > > client knows the error is non retryable. I think there must be some
> > > > reasons I am not aware of that zk does not offer it, so I'd like to
> > > > ask here. Or is there any ticket/email thread that has discussed this?
> > > >
> > > > Maybe zk would expect the app client to handle connection loss
> > > > appropriately, eg. by having a retry strategy(backoff retry, limiting
> > > > the retry count, etc.). Is this what zk would expect, instead of
> > > > returning a PacketLenError exception?
> > > >
> > > > Really appreciate any input.
> > > >
> > > >
> > > > Best,
> > > > -Huizhi
> > > >

Reply via email to