On Tue, Dec 18, 2012 at 4:46 AM, Shai Erera <ser...@gmail.com> wrote: > Are you sure that all Codecs return 1 if you indexed with DOCS_ONLY? Do we > have a test that can trip bad Codecs?
I'm not sure! We should make a test & fix any failing ones ... > It may be more than just changing the documentation... Right. > Why would e.g. TermQuery need to write specialized code for these cases? I > looked at TermScorer, and its freq() just returns docsEnum.freq(). I meant if we did not adopt this spec ("freq() will lie and return 1 when the field was indexed as DOCS_ONLY"), then e.g. TermQuery would need specialized code. > I think that Similarity may be affected? Which brings the question - how do > Similarity impls know what flags the DE was opened with, and shouldn't they > be specialized? > E.g. TFIDFSimilarity.ExactTFIDFDocScorer uses the freq passed to score() as > an index to an array, so clearly it assumes it is >= 0 and also < > scoreCache.length. > So I wonder what will happen to it when someone's Codec will return a > negative value or MAX_INT in case frequencies aren't needed? Well, if you passed FLAGS_NONE when you opened the DE then it's your responsibility to never call freq() ... ie, don't call freq() and pass that to the sim. > I do realize that you shouldn't call Similarity with missing information, > and TermWeight obtains a DocsEnum with frequencies, so in that regard it is > safe. > And if you do obtain a DocsEnum with FLAG_NONE, you'd better know what > you're doing and don't pass a random freq() to Similarity. Right. > I lean towards documenting the spec from above, and ensuring that all Codecs > return 1 for DOCS_ONLY. +1 So freq() is undefined if you had passed FLAGS_NONE, and we will lie and say freq=1 (need a test verifying this) if the field was indexed as DOCS_ONLY. > If in the future we'll need to handle the case where someone receives a > DocsEnum which it needs to consume, and doesn't know which flags were used > to open it, we can always add a getFlags to DE. Yeah ... Mike McCandless http://blog.mikemccandless.com --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org