Re: Token/Payload API

Grant Ingersoll Tue, 15 May 2007 06:31:41 -0700

One thing that I forgot to add that is now possible, via the Payloadmechanism is based on a comment during your ApacheCon EUpresentation, something to the effect that we can't score binaryfields. Now with Payload scoring, a binary Field is essentially aDocument level payload. It should be quite easy to implement a Query/Scorer combination that has a callback to scorePayload if people areinterested in such a thing. I would propose, however, that if we gothis route, we may want to overload the scorePayload method to passin field information, i.e. field name.

And, of course, I haven't looked in depth into the FunctionQuerycapabilities, which may already provide for this possibility.


Just thinking out loud,
Grant

On May 11, 2007, at 9:03 PM, Yonik Seeley wrote:

On 5/11/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:

On May 11, 2007, at 4:31 PM, Yonik Seeley wrote:

> I hadn't kept up with the payload discussion/patch, and just got
> around to looking at Token.
>
> public class Token implements Cloneable {
> String termText; // the text ofthe term> int startOffset; // start insource text> int endOffset; // end in sourcetext> String type = "word"; //lexical type
>
>  Payload payload;
>
>
> It almost feels like we are going down the road of Field, addingmore> and more to the base class instead of using some other mechanismlike
> inheritance.

So PayloadToken would be more inline with what you are thinking?
Then there becomes the need to do instanceof to determine when you
have payloads?


I don't have a good answer for that one... a real inheritance solution
would be invasive to the indexing code and probably not worth it at
this point.  There is also the problem of mixing different (future)
token properties... what you really want are mixins or something.

At this point, just forget I brought it up ;-)

> A bigger problem, however, is that payloads will be lost by filters
> that aren't payload aware, and create new Tokens.  We had the same
> problem with position increments being lost.
>
> For this latter problem, I think the answer is to *not* create new
> tokens, and make all the properties of Token settable.

This seems reasonable.  I never quite understood the need to create
new tokens.  The other option may be to use a copy constructor, but
again, that seems wasteful.


We have clone() when new tokens need to be created (that's needed when
filters create more tokens, like synonym injection, etc).  Since Token
could be subclassed, that's probably the right approach.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/LuceneFAQ




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Token/Payload API

Reply via email to