dweiss commented on pull request #2094: URL: https://github.com/apache/lucene-solr/pull/2094#issuecomment-734255849
I took a look out of curiosity. Here are my conclusions: - making DataInput/DataOutput know about the byte-order indeed introduces a lot of byteOrder-juggling around. I still think this is better than explicit byte-reversal wrappers because even if you need to store and pass byteOrder, you don't necessarily care about its value - it is just accepted and passed where it's consumed. It does add a lot of boilerplate, agreed. - while looking at all this I think the big design flaw is that streams opened by Directory instances are readily suitable for reading/ writing complex types... Many places that consume IndexInput/IndexOutput (or DataInput/DataOutput) are not using byte order sensitive methods at all. These places only read bytes, they wouldn't need to know about the byte order (making things much simpler). Whenever there is a need for readInt/readLong, etc. DataInput/DataOutput should be able to provide an efficient implementation depending on the byte order requested; something like: ``` ComplexTypeDataOutput getWriter(ByteOrder bo); ComplexTypeDataInput getReader(ByteOrder bo); ``` It doesn't have to be a wrapper object (but initially it can be - a common one for all DataInput/DataOutput implementations!) - the performance can be kept high in a specific implementation by the very source object implementing the byte order-sensitive interface in a specific way... What I see as a big upside of this is that any such refactoring should work for all the codecs - previous and current - by just passing the byte order to where it's relevant. You could even switch the byte order arbitrarily and (with the exception of backward-compatibility checks) all tests should still pass... Eventually, the byte order flag can be removed entirely as assertions are introduced in places where the byte order is fixed. The rest of the code would happily just use complex type read/write methods, oblivious of what the byte order in the underlying stream is. The disadvantage, of course, is that the very core Lucene API would have to change (DataInput, DataOutput, IndexInput, IndexOutput)... Do you think this makes sense? Or is Ignacio's patch a better way forward? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org