Markus - how are you encoding payloads as bitsets and use them for scoring?
Curious to see how folks are leveraging them.
Erik
> On Jun 14, 2017, at 4:45 PM, Markus Jelsma <[email protected]> wrote:
>
> Hello,
>
> We use POS-tagging too, and encode them as payload bitsets for scoring, which
> is, as far as is know, the only possibility with payloads.
>
> So, instead of encoding them as payloads, why not index your treebanks
> POS-tags as tokens on the same position, like synonyms. If you do that, you
> can use spans and phrase queries to find chunks of multiple POS-tags.
>
> This would be the first approach i can think of. Treating them as regular
> tokens enables you to use regular search for them.
>
> Regards,
> Markus
>
>
>
> -----Original message-----
>> From:José Tomás Atria <[email protected]>
>> Sent: Wednesday 14th June 2017 22:29
>> To: [email protected]
>> Subject: Using POS payloads for chunking
>>
>> Hello!
>>
>> I'm not particularly familiar with lucene's search api (as I've been using
>> the library mostly as a dumb index rather than a search engine), but I am
>> almost certain that, using its payload capabilities, it would be trivial to
>> implement a regular chunker to look for patterns in sequences of payloads.
>>
>> (trying not to be too pedantic, a regular chunker looks for 'chunks' based
>> on part-of-speech tags, e.g. noun phrases can be searched for with patterns
>> like "(DT)?(JJ)*(NN|NP)+", that is, an optional determinant and zero or
>> more adjectives preceding a bunch of nouns, etc)
>>
>> Assuming my index has POS tags encoded as payloads for each position, how
>> would one search for such patterns, irrespective of terms? I started
>> studying the spans search API, as this seemed like the natural place to
>> start, but I quickly got lost.
>>
>> Any tips would be extremely appreciated. (or references to this kind of
>> thing, I'm sure someone must have tried something similar before...)
>>
>> thanks!
>> ~jta
>> --
>>
>> sent from a phone. please excuse terseness and tpyos.
>>
>> enviado desde un teléfono. por favor disculpe la parquedad y los erroers.
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]