I'm experimenting with Document boosts and I'm finding them effective for
certain types of scoring enhancements. My concern is that because of the way
they are stored (ie an encoded byte). There are not enough boost values to
cover the typical boosting. I've written a custom Similarity function (ie
extended Similarity by overriding encodeNorm and decodeNorm). However, that
still limits me to 256 distinct boost factors. These 256 factors have to
cover the full range of the combined effect of Document boosts, Field boosts
and length norms in the default calculation.
 
I'd be interested in at least an option to store these as floats. Any change
requires changes to the file formats (either a new file or a new format for
norm files (*.f??)). 
 
Has this been considered and rejected? Clearly, there would be additional
memory consumption for an array of floats instead of bytes, but it seems
like a small overall impact. Speed should not be affected as these are
always converted to floats for use in the existing calculations.
 
I hesitate to embark on a change that affects file formats so if folks have
any suggestions on alternative approaches, I'm interested in hearing about
them.
 
Any suggestions for how to incorporate these and stay consistent with
current Lucene design?
 
Thanks,
Dan

Reply via email to