Of course:

We are trying to search in documents that contain text in several languages. We 
are also investigating other approaches*, so this is not about finding other 
variants.
the goal is to only match tokens from 1 or more given languages and not to 
match the token if it is by accident the same in another language.

For the payloads my plan is to add the correct language to each and every token 
during indexing (I'm not sure how to solve this best, but I'm sure this can be 
solved at least with lucene directly).
On search side my current idea is to wrap around a TermPosition and skip all 
docs, where the current payload has not one of the requested languages.
I probably need to use my own Query/Weight for this?
Another approach would be to just overwrite the Similarity, but this will only 
influence scoring and depending on the underlying query not completely skip the 
token - I have to test the difference for the final score between this 
approaches.

This one blog made me curious if there is already something similar, that skips 
TermPositions based on given attributes? I could imagine something similar to 
the current Tokenattribute concept during index time, but also available during 
search and controlled by a similarity...

Jan

-----Original Message-----
From: ext Simon Willnauer [mailto:simon.willna...@googlemail.com] 
Sent: Dienstag, 23. November 2010 17:50
To: java-user@lucene.apache.org
Subject: Re: custom attributs in tokens

On Tue, Nov 23, 2010 at 4:50 PM,  <jan.kure...@nokia.com> wrote:
> Yes, payloads I will use. But they perform at score time and not at search 
> time. I just wanted to know if there is anything like that.
>
So what is the difference? Maybe you can elaborate a little what are
you trying to do?

simon
> "not even on trunk" does this mean there is a discussion about this ongoing 
> somewhere? I'm just curious.
>
> Jan
>
> -----Original Message-----
> From: ext Simon Willnauer [mailto:simon.willna...@googlemail.com]
> Sent: Dienstag, 23. November 2010 16:44
> To: java-user@lucene.apache.org
> Subject: Re: custom attributs in tokens
>
> Attribute Serialization is not implemented yet, not even in trunk. You
> can use payloads instead.
>
> Simon
>
> On Tue, Nov 23, 2010 at 2:43 PM,  <jan.kure...@nokia.com> wrote:
>> Hi,
>>
>> I found a blog post from 2008 where it says, there will be additional custom 
>> attributes for tokens in the future, that will be searchable.
>> What is the status of these?
>>
>> Jan
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to