New Token filter for adding payloads "in-stream"
------------------------------------------------
Key: LUCENE-1676
URL: https://issues.apache.org/jira/browse/LUCENE-1676
Project: Lucene - Java
Issue Type: New Feature
Components: contrib/analyzers
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
Fix For: 2.9
This TokenFilter is able to split a token based on a delimiter and use one part
as the token and the other part as a payload. This allows someone to include
payloads inline with tokens (presumably setup by a pipeline ahead of time). An
example is apropos. Given a | delimiter, we could have a stream that looks
like:
{quote}The quick|JJ red|JJ fox|NN jumped|VB over the lazy|JJ brown|JJ
dogs|NN{quote}
In this case, this would produce tokens and payloads (assuming whitespace
tokenization):
Token: the
Payload: null
Token: quick
Payload: JJ
Token: red
Pay: JJ.
and so on.
This patch will also support pluggable encoders for the payloads, so it can
convert from the character array to byte arrays as appropriate.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]