mikemccand commented on issue #12458: URL: https://github.com/apache/lucene/issues/12458#issuecomment-1653658734
The code does in fact seem to try to handle this case, when the start/end UTF-8 have different numbers of bytes, in the final `else` clause in the confusing `build` method: ``` } else { // start start(start, end, startUTF8, upto, true); // possibly middle, spanning multiple num bytes int byteCount = 1 + startUTF8.len - upto; final int limit = endUTF8.len - upto; while (byteCount < limit) { // wasteful: we only need first byte, and, we should // statically encode this first byte: tmpUTF8a.set(startCodes[byteCount - 1]); tmpUTF8b.set(endCodes[byteCount - 1]); all(start, end, tmpUTF8a.byteAt(0), tmpUTF8b.byteAt(0), tmpUTF8a.len - 1); byteCount++; } ``` Here the bug must lurk! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org