David M. Lloyd wrote:
<snip/>
I think that using a byte[] (for instance in the encoder), transform
it to a ByteBuffer, is another way to deal with the problem.
One important point is that ByteBuffers are just mean to contains a
fixed amount of data. It's a buffer, not a data structure.
Transforming ByteBuffer to make them able to expand twist their
intrinsic semantic.
Yes, it makes far more sense to accumulate buffers until you can
decode your message from it.
Or decode the stream as it comes, creating the object on the fly. A
statefull decoder...
So I would say that BB should be used on the very low level (reading
data and sending data), but then, the other layers should use byte[]
or a stream of bytes.
I don't see the advantage of using byte[] honestly - using at the
least a wrapper object seems preferable.
This is what we are doing in ADS : LDAP messages are built on the fly,
simply by coping with ByteBuffers.
Consider that accumulating BB to create a big byte[] should be
understand as : transform BB directly to the targeted wrapper objects.
Thanks for correcting me :)
And if you're going to use a wrapper object, why not just use ByteBuffer.
Because you may receive more than one BB before you can build the
wrapper object.
This will lead to very intersting performances questions :
- how to handle large stream of data ?
One buffer at a time. :-)
Well, I tried to think about other strategies, but, eh, you are just
plain right ! It's up to the codec filter to deal with the complexity of
the data it has to decode !
- should we serialize the stream at some point ?
What do you mean by "serialize"?
Write to disk if the received data are too big. See my previous point
(it's up to the decoder to deal with this)
- how to write an efficient decoder, when you may receive fractions
of what you are waiting for ?
An ideal decoder would be a state machine which can be entered and
exited at any state. This way, even a partial buffer can be fully
consumed before returning to wait for the next buffer.
This is what we have in ADS : A stateful decoder. Not as simple as if
you have the whole data in memory, especially if you have to deal with
multi-bytes markers, but not too complex neither.
However many decoders are not ideal due to various constraints. In
the worst case, you could accumulate ByteBuffer instances until you
have a complete message that can be handled. What I do at this point
is to create a DataInputStream that encapsulates all the received
buffers.
Yeah, 100% agree.
Note that a buffer might contain data from more than one message as
well. So it's important to use only a slice of the buffer in this case.
Not a big deal. Again, it's the decoder task to handle such a case. We
have experimented such a case in LDAP too.
(make me think that we should describe the ldap codec on the MINA site,
just to give some insight for people who want to write a statefull decoder)
- how to write an efficient encoder when you have no idea about the
size of the data you are going to send ?
Use a buffer factory, such as IoBufferAllocator, or use an even
simpler interface like this:
public interface BufferFactory {
ByteBuffer createBuffer();
}
which mass-produces pre-sized buffers. In the case of stream-oriented
systems like TCP or serial, you could probably send buffers as you
fill them. For message-oriented protocols like UDP, you can
accumulate all the buffers to send, and then use a single gathering
write to send them as a single message (yes, this stinks in the
current NIO implementation, as Trustin pointed out in DIRMINA-518, but
it's no worse than the repeated copying that auto-expanding buffers
use; and APR and other possible backends [and, if I have any say at
all in it, future OpenJDK implementations] would hopefully not suffer
from this limitation).
That's an idea. But this does not solve one little pb : if the reader is
slow, you may saturate the server memory with prepared BB. So you may
need a kind of throttle mechanism, or a blocking queue, to manage this
issue : a new BB should not be created unless the previous one has been
completely sent.
For all these reasons, the mail I sent a few days ago express my
personnal opinion that IoBuffer may be a little bit overkilling
(remember that this class -and the associated tests- represent around
13% of all mina common code ! )
Yes, that's very heavy. I looked at resolving DIRMINA-489 more than
once, and was overwhelmed by the sheer number of methods that had to
be implemented, and the overly complex class structure.
One option could be to use ByteBuffer with some static support methods,
+1
and streams to act as the "user interface" into collections of
buffers. For example, an InputStream that reads from a collection of
buffers, and an OutputStream that is configurable to auto-allocate
buffers, performing an action every time a buffer is filled:
public interface BufferSink {
void handleBuffer(ByteBuffer buffer);
}
That's an option.
Another option is to skip ByteBuffers and go with raw byte[] objects
(though this closes the door completely to direct buffers).
Well, ByteBuffers are so intimately wired with NIO that I don't think we
can easily use byte[] without losing performances... (not sure though ...)
Yet another option is to have a simplified abstraction for byte arrays
like Trustin proposes, and use the stream cleasses for the buffer
state implementation.
This is all in addition to Trustin's idea of providing a byte array
abstraction and a buffer state abstraction class.
I'm afraid that offering a byte[] abstraction might lead to more
complexity, wrt with what you wrote about the way codec should handle
data. At some point, your ideas are just the good ones, IMHO : use BB,
and let the codec deal with it. No need to add more complex data
structure on top of it.
Otherwise, the idea may be to define some simple codec which transform a
BB to a array[], for those who need it. As we have a cool Filter chain,
let's use it...
wdyt ?
--
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org