On 29 Jan 2012, at 23:25, Andy Seaborne wrote: > On 29/01/12 21:40, Henry Story wrote: >> It would be better of course if the structure passed could be one that >> was not >> changeable, even better, if it could use NIO bytes buffers as that reduces >> the need even to copy data, but I guess that the Jena parsers were not >> written >> with that in mind. > > This bit, I didn't follow.
I just discovered this, which you should find very interesting http://akka.io/docs/akka/2.0-M3/scala/io.html > > Parsing, in general, needs a char stream and, for Turtle one-char look ahead. > > The parsers work from InputStreams. The RIOT parsers work from Tokenizers, > which normally work from InputStreams but it's chnagable as its Jena code. > > An InputStream is just an interface and a bit of machinary (AKA a trait) - it > can be implemented to implement over NIO buffers so a zero-copy design is > quite possible. > > RIOT has PeekInputStream which could be adapted to get bytes from an NIO > buffer. > > My experience is that accessing an NIO buffer byte-by-byte needs a little > care - it may not be very cheap as several checks are always done and, while > the JIT is good, the per-byte cost that can be significant. It might be > better to read out chunks (RIOT's InputStreamBuffered). It would still be > zero-copy overall - no complete copy of the source taken. > > Copying is not always bad - I have tried to do faster-than-std-java > conversion of UTF-8 bytes to chars in pure code, no copy, but the built-in > decoder (which is probably native code) is still a few-% better despite the > fact it introduces a copy. CharsetDecoders work on ByteBuffers. I don't > think its possible in java to avoid a copy at the point of bytes->chars. > > Andy Social Web Architect http://bblfish.net/
