Mikhail is right. I was getting hung up on the new API in this LUCENE-3687.
Instead, one could use the existing API and encode up to four different ways of
doc length using bytes joining into a long (bitwise). Thank you, Robert Muir,
for pointing this out to me!
On Sunday, October 4, 2015 6:56 AM, Ivan Provalov <[email protected]>
wrote:
Mikhail,
Thank you for your reply.
Even though the long is returned from this function, it is always encoded as a
single byte lossy representation. In order to change that and add other norms
(for using other similarity functions on the same indexed data), there should
be a support for multiple norms. Imagine using two similarities side-by-side -
a default and an LMSimilarity with discountOverlaps set to false, or trying a
different doc length normalization where the length shouldn't be a reciprocal
square root function like in the DefaultSimilarity. The only way of doing it,
is to have the multiple norms stored.
Here is the LUCENE-3687 Description:
"This removes the long standing limitation that norms are a single byte. Yet,
we still need to expose this functionality to Similarity to write / encode
norms in a different format."
I am wondering if there is a plan to roll this into a release.
Thanks,
Ivan
On Saturday, October 3, 2015 11:04 PM, Mikhail Khludnev
<[email protected]> wrote:
Hello,
Norms can be long, see
org.apache.lucene.search.similarities.TFIDFSimilarity.encodeNormValue(float)
/** Encodes a normalization factor for storage in an index. */
public abstract long encodeNormValue(float f);
On Sun, Oct 4, 2015 at 6:39 AM, Ivan Provalov <[email protected]>
wrote:
When does this 4.0-ALPHA feature going to be included in the released version?
>https://issues.apache.org/jira/browse/LUCENE-3687
>It's the "Allow similarity to encode norms other than a single byte".
>
>
>I thought that it would be in the released versions, but it looks like it's
>only on 4.0-alpha. I am using 4.6.1, but also looked in 5.3.1 source, none of
>these include the changes.
>
>
>With these changes, the new API in the sim class is accepting the norms, like
>so: computeNorm(FieldInvertState state, Norm norm).
>
>
>Thank you,
>
>
>Ivan Provalov
--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics