On Tue, Dec 18, 2012 at 4:46 AM, Shai Erera <ser...@gmail.com> wrote:
> Are you sure that all Codecs return 1 if you indexed with DOCS_ONLY? Do we
> have a test that can trip bad Codecs?

I'm not sure!  We should make a test & fix any failing ones ...

> It may be more than just changing the documentation...

Right.

> Why would e.g. TermQuery need to write specialized code for these cases? I
> looked at TermScorer, and its freq() just returns docsEnum.freq().

I meant if we did not adopt this spec ("freq() will lie and return 1
when the field was indexed as DOCS_ONLY"), then e.g. TermQuery would
need specialized code.

> I think that Similarity may be affected? Which brings the question - how do
> Similarity impls know what flags the DE was opened with, and shouldn't they
> be specialized?
> E.g. TFIDFSimilarity.ExactTFIDFDocScorer uses the freq passed to score() as
> an index to an array, so clearly it assumes it is >= 0 and also <
> scoreCache.length.
> So I wonder what will happen to it when someone's Codec will return a
> negative value or MAX_INT in case frequencies aren't needed?

Well, if you passed FLAGS_NONE when you opened the DE then it's your
responsibility to never call freq() ... ie, don't call freq() and pass
that to the sim.

> I do realize that you shouldn't call Similarity with missing information,
> and TermWeight obtains a DocsEnum with frequencies, so in that regard it is
> safe.
> And if you do obtain a DocsEnum with FLAG_NONE, you'd better know what
> you're doing and don't pass a random freq() to Similarity.

Right.

> I lean towards documenting the spec from above, and ensuring that all Codecs
> return 1 for DOCS_ONLY.

+1

So freq() is undefined if you had passed FLAGS_NONE, and we will lie
and say freq=1 (need a test verifying this) if the field was indexed
as DOCS_ONLY.

> If in the future we'll need to handle the case where someone receives a
> DocsEnum which it needs to consume, and doesn't know which flags were used
> to open it, we can always add a getFlags to DE.

Yeah ...

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to