One thing that I forgot to add that is now possible, via the Payload mechanism is based on a comment during your ApacheCon EU presentation, something to the effect that we can't score binary fields. Now with Payload scoring, a binary Field is essentially a Document level payload. It should be quite easy to implement a Query/ Scorer combination that has a callback to scorePayload if people are interested in such a thing. I would propose, however, that if we go this route, we may want to overload the scorePayload method to pass in field information, i.e. field name.

And, of course, I haven't looked in depth into the FunctionQuery capabilities, which may already provide for this possibility.

Just thinking out loud,
Grant

On May 11, 2007, at 9:03 PM, Yonik Seeley wrote:

On 5/11/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
On May 11, 2007, at 4:31 PM, Yonik Seeley wrote:

> I hadn't kept up with the payload discussion/patch, and just got
> around to looking at Token.
>
> public class Token implements Cloneable {
> String termText; // the text of the term > int startOffset; // start in source text > int endOffset; // end in source text > String type = "word"; // lexical type
>
>  Payload payload;
>
>
> It almost feels like we are going down the road of Field, adding more > and more to the base class instead of using some other mechanism like
> inheritance.

So PayloadToken would be more inline with what you are thinking?
Then there becomes the need to do instanceof to determine when you
have payloads?

I don't have a good answer for that one... a real inheritance solution
would be invasive to the indexing code and probably not worth it at
this point.  There is also the problem of mixing different (future)
token properties... what you really want are mixins or something.

At this point, just forget I brought it up ;-)

> A bigger problem, however, is that payloads will be lost by filters
> that aren't payload aware, and create new Tokens.  We had the same
> problem with position increments being lost.
>
> For this latter problem, I think the answer is to *not* create new
> tokens, and make all the properties of Token settable.

This seems reasonable.  I never quite understood the need to create
new tokens.  The other option may be to use a copy constructor, but
again, that seems wasteful.

We have clone() when new tokens need to be created (that's needed when
filters create more tokens, like synonym injection, etc).  Since Token
could be subclassed, that's probably the right approach.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to