One thing that I forgot to add that is now possible, via the Payload
mechanism is based on a comment during your ApacheCon EU
presentation, something to the effect that we can't score binary
fields. Now with Payload scoring, a binary Field is essentially a
Document level payload. It should be quite easy to implement a Query/
Scorer combination that has a callback to scorePayload if people are
interested in such a thing. I would propose, however, that if we go
this route, we may want to overload the scorePayload method to pass
in field information, i.e. field name.
And, of course, I haven't looked in depth into the FunctionQuery
capabilities, which may already provide for this possibility.
Just thinking out loud,
Grant
On May 11, 2007, at 9:03 PM, Yonik Seeley wrote:
On 5/11/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
On May 11, 2007, at 4:31 PM, Yonik Seeley wrote:
> I hadn't kept up with the payload discussion/patch, and just got
> around to looking at Token.
>
> public class Token implements Cloneable {
> String termText; // the text of
the term
> int startOffset; // start in
source text
> int endOffset; // end in source
text
> String type = "word"; //
lexical type
>
> Payload payload;
>
>
> It almost feels like we are going down the road of Field, adding
more
> and more to the base class instead of using some other mechanism
like
> inheritance.
So PayloadToken would be more inline with what you are thinking?
Then there becomes the need to do instanceof to determine when you
have payloads?
I don't have a good answer for that one... a real inheritance solution
would be invasive to the indexing code and probably not worth it at
this point. There is also the problem of mixing different (future)
token properties... what you really want are mixins or something.
At this point, just forget I brought it up ;-)
> A bigger problem, however, is that payloads will be lost by filters
> that aren't payload aware, and create new Tokens. We had the same
> problem with position increments being lost.
>
> For this latter problem, I think the answer is to *not* create new
> tokens, and make all the properties of Token settable.
This seems reasonable. I never quite understood the need to create
new tokens. The other option may be to use a copy constructor, but
again, that seems wasteful.
We have clone() when new tokens need to be created (that's needed when
filters create more tokens, like synonym injection, etc). Since Token
could be subclassed, that's probably the right approach.
-Yonik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp
Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]