Unifying encode/decode to a single class makes sense to me. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley
On Fri, Sep 4, 2020 at 11:12 AM Gus Heck <[email protected]> wrote: > In reviewing SOLR-14787 <https://issues.apache.org/jira/browse/SOLR-14787> , > with the author verbally, he pointed out there's some duplicated code where > he needed to decode a payload into a float. We > have org.apache.lucene.analysis.payloads.PayloadHelper#decodeFloat(byte[], > int) but lucene/queries doesn't have lucene/analysis-common on the > classpath and he found no better option than to essentially cut and paste > those methods into his code. Neither of us likes that solution very much > because code duplication is not cool, so I went fishing for other options > or examples in the code base (on 8.x) and ran into > PayloadDecoder.FLOAT_DECODER... which looks quite broken, or at least > wildly different: > > PayloadDecoder FLOAT_DECODER = bytes -> bytes == null ? 1 : > bytes.bytes[bytes.offset]; > > this coerces a single byte to a float, but analysis common is encoding via > PayloadHelper like this: > > public static byte[] encodeFloat(float payload, byte[] data, int offset){ > return encodeInt(Float.floatToIntBits(payload), data, offset); > } > > Which can clearly write more than one byte... and decoding via > > public static final float decodeFloat(byte [] bytes, int offset){ > return Float.intBitsToFloat(decodeInt(bytes, offset)); > } > > public static final int decodeInt(byte [] bytes, int offset){ > return ((bytes[offset] & 0xFF) << 24) | ((bytes[offset + 1] & 0xFF) << > 16) > | ((bytes[offset + 2] & 0xFF) << 8) | (bytes[offset + 3] & > 0xFF); > } > > Which is clearly expecting 4 bytes... and nothing like FLOAT_DECODER > > The scary thing is PayloadDecoder.FLOAT_DECODER seems to be used in > BoostingTermBuilder... and a whole bunch of unit tests, which I presume are > passing on coincidence of handling floats that happen to qualify as small > value integers but I haven't tested that theory yet. > > PayloadDecoder.FLOAT_DECODER has been around for a long time, so I thought > I'd solicit opinions before attempting to clean it up. My concept of a > cleanup here would include trying to find a home for the float decoding > logic (from PayloadHelper) that can be seen by both areas and unifying > encode/decode for float payloads to a single location, and also add a unit > test that round trips the encode/decode cycle (which I'm not seeing). > > -Gus > > > http://www.needhamsoftware.com (work) > http://www.the111shift.com (play) >
