uschindler commented on pull request #308: URL: https://github.com/apache/lucene/pull/308#issuecomment-922537922
> I like how it makes things easier to read. One that I'd be interested in migrating as well would be `FSIndexOutput`, though this one is more tricky since writing shorts/ints/longs to a `byte[]` occurs through several layers of abstractions. Thanks @jpountz! About `FSIndexOutput`: Actually there's not too much abstractions and `FSIndexOutput` is just a thin wrapper for "backwards compatibility" and to work around some very old Java bug regarding too big `byte[]` writes causing allocations of direct buffers. The class that has to be modified is `OutputStreamIndexOutput`, although it's not as easy as it seems. This class does not have its own buffering so it does not have a `byte[]` at hand, it can only call methods available in `OutputStream`. But as it also uses checksumming it needs to buffer and uses `BufferedOutputStream` to do this (this spares us an own buffering which was removed by myself around Lucene 4/5). So it just relies on `DataOutput` to write `byte` and `byte[]` to the stream - not more is basically available. Luckily, `BufferedOutputStream` is a private implementation detail and can be subclassed to have protected access and clear documentation to its buffer behaviour. So we may add additional custom methods that would insert the int/long/short directly into the buffer's array and modify the consumed bytes (that's allowed). If there's no space in buffer we would simply fallback with try-catch-logic and fallback to DataOutput's original method. It's a bit of pity that DataOutput is still a class and no interface which makes this construct hard to implement (nowadays I would make many classes in Lucene interfaces and use default methods - this is one example, but there are many others). In short: We could do this, but that's a new feature and requires a separate issue - and I have a plan how to do this without our own buffering reinvented. You may open a new issue, but I am not sure if this really brings much (or did you see bottlenecks on writing index files?). In addition, when doing this we can also improve `ByteArrayDataOutput`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
