[GitHub] [lucene-solr] dweiss commented on pull request #2094: LUCENE-9047: Move the Directory APIs to be little endian

GitBox Thu, 26 Nov 2020 03:55:37 -0800


dweiss commented on pull request #2094:
URL: https://github.com/apache/lucene-solr/pull/2094#issuecomment-734255849



   I took a look out of curiosity. Here are my conclusions:
   
   - making DataInput/DataOutput know about the byte-order indeed introduces a 
lot of byteOrder-juggling around. I still think this is better than explicit 
byte-reversal wrappers because even if you need to store and pass byteOrder, 
you don't necessarily care about its value - it is just accepted and passed 
where it's consumed. It does add a lot of boilerplate, agreed.
   
   - while looking at all this I think the big design flaw is that streams 
opened by Directory instances are readily suitable for reading/ writing complex 
types... Many places that consume IndexInput/IndexOutput (or 
DataInput/DataOutput) are not using byte order sensitive methods at all. These 
places only read bytes, they wouldn't need to know about the byte order (making 
things much simpler). Whenever there is a need for readInt/readLong, etc. 
DataInput/DataOutput should be able to provide an efficient implementation 
depending on the byte order requested; something like:
   ```
   ComplexTypeDataOutput getWriter(ByteOrder bo);
   ComplexTypeDataInput getReader(ByteOrder bo);
   ```
   It doesn't have to be a wrapper object (but initially it can be - a common 
one for all DataInput/DataOutput implementations!) - the performance can be 
kept high in a specific implementation by the very source object implementing 
the byte order-sensitive interface in a specific way...
   
   What I see as a big upside of this is that any such refactoring should work 
for all the codecs - previous and current - by just passing the byte order to 
where it's relevant. You could even switch the byte order arbitrarily and (with 
the exception of backward-compatibility checks) all tests should still pass... 
Eventually, the byte order flag can be removed entirely as assertions are 
introduced in places where the byte order is fixed. The rest of the code would 
happily just use complex type read/write methods, oblivious of what the byte 
order in the underlying stream is.
   
   The disadvantage, of course, is that the very core Lucene API would have to 
change (DataInput, DataOutput, IndexInput, IndexOutput)...
   
   Do you think this makes sense? Or is Ignacio's patch a better way forward? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on pull request #2094: LUCENE-9047: Move the Directory APIs to be little endian

Reply via email to