Hi Today TfIdfSimilarity forces the encoding of norms into a single byte, and there's no way to override it. E.g. if I don't want to lose precision, the only thing I can do is write a different Similarity while copying most of the code from TfIdfSimilarity.
I looked at the code and I was wondering if we cannot move this byte encoding to DefaultSimilarity, while making encodeNorm and decodeNorm take/return long values, as eventually Sim.computeNorm returns a long value. That way, DefaultSim will document the byte-encoding it uses (move the respective jdocs from TfIdfSim to Default), while allowing to implement TfIdfSim without losing that much precision, as well as not copying the entire class. While we're at it, I read LUCENE-1907 about queryNorm and I think this is important too. So whether we introduce a queryNorm or make computeWeight non-final, to allow returning other SimWeight impls, I think we should allow that extension. I was about to open a JIRA issue to handle that until I noticed the explicit documentation in TfIdfSim (re the byte encoding), so thought I'd ask first if anybody objects. Back-compat wise, we will need to break it (by changing encode/decodeNorm signatures), but I think that's ok in this case. Overriding TfIdfSim is expert enough IMO. Shai
