mikemccand commented on issue #12458:
URL: https://github.com/apache/lucene/issues/12458#issuecomment-1653658734

   The code does in fact seem to try to handle this case, when the start/end 
UTF-8 have different numbers of bytes, in the final `else` clause in the 
confusing `build` method:
   
   ```
       } else {
   
         // start
         start(start, end, startUTF8, upto, true);
   
         // possibly middle, spanning multiple num bytes
         int byteCount = 1 + startUTF8.len - upto;
         final int limit = endUTF8.len - upto;
         while (byteCount < limit) {
           // wasteful: we only need first byte, and, we should
           // statically encode this first byte:
           tmpUTF8a.set(startCodes[byteCount - 1]);
           tmpUTF8b.set(endCodes[byteCount - 1]);
           all(start, end, tmpUTF8a.byteAt(0), tmpUTF8b.byteAt(0), tmpUTF8a.len 
- 1);
           byteCount++;
         }
   ```
   
   Here the bug must lurk!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to