short types in byte arrays (e.g. ByteArrayDataInput)

GitBox Sun, 19 Sep 2021 14:14:08 -0700


uschindler commented on pull request #308:
URL: https://github.com/apache/lucene/pull/308#issuecomment-922537922



   > I like how it makes things easier to read. One that I'd be interested in 
migrating as well would be `FSIndexOutput`, though this one is more tricky 
since writing shorts/ints/longs to a `byte[]` occurs through several layers of 
abstractions.
   
   Thanks @jpountz!
   
   About `FSIndexOutput`: Actually there's not too much abstractions and 
`FSIndexOutput` is just a thin wrapper for "backwards compatibility" and to 
work around some very old Java bug regarding too big `byte[]` writes causing 
allocations of direct buffers.
   
   The class that has to be modified is `OutputStreamIndexOutput`, although 
it's not as easy as it seems. This class does not have its own buffering so it 
does not have a `byte[]` at hand, it can only call methods available in 
`OutputStream`. But as it also uses checksumming it needs to buffer and uses 
`BufferedOutputStream` to do this (this spares us an own buffering which was 
removed by myself around Lucene 4/5). So it just relies on `DataOutput` to 
write `byte` and `byte[]` to the stream - not more is basically available. 
Luckily, `BufferedOutputStream` is a private implementation detail and can be 
subclassed to have protected access and clear documentation to its buffer 
behaviour. So we may add additional custom methods that would insert the 
int/long/short directly into the buffer's array and modify the consumed bytes 
(that's allowed). If there's no space in buffer we would simply fallback with 
try-catch-logic and fallback to DataOutput's original method. It's a bit of 
pity that 
 DataOutput is still a class and no interface which makes this construct hard 
to implement (nowadays I would make many classes in Lucene interfaces and use 
default methods - this is one example, but there are many others).
   
   In short: We could do this, but that's a new feature and requires a separate 
issue - and I have a plan how to do this without our own buffering reinvented. 
You may open a new issue, but I am not sure if this really brings much (or did 
you see bottlenecks on writing index files?). In addition, when doing this we 
can also improve `ByteArrayDataOutput`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] uschindler commented on pull request #308: LUCENE-10113: Use VarHandles to access int/long/short types in byte arrays (e.g. ByteArrayDataInput)

Reply via email to