[
https://issues.apache.org/jira/browse/SOLR-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024615#comment-13024615
]
Uwe Schindler commented on SOLR-2400:
-------------------------------------
After thinking a little bit more, I think it would even be possible to add this
Filter after *each* Step to track tokens. The resulting Attribute would then
contain the whole tracking of positions:
- After Tokenizer this attribute would contains "0", "1", "2",...
- After the first TokenFilter: "0.0", "1.1", "1.2", "1.3", "2.2" (while the
second token (1) emitteded by the Tokenizer was split into 3 Tokens). I think
this would help? Additionally the Filter could use PositionIncrement to track
same position tokens - or this could be left to the consumer (so if 1.2 and 1.3
have posIncr 0, the consumer knows that they all are at same position). If the
TokenFilter would use the PosIncr to increment the unique IDs, then this would
be solved (so 1.x tokens would always get "1.1" as ID if at same position).
I will think about it an supply a patch that enriches the
FieldAnalysisContentHandler by this extra attribute.
We can then iterate. But today is Easter Holiday, so little bit later...
> FieldAnalysisRequestHandler; add information about token-relation
> -----------------------------------------------------------------
>
> Key: SOLR-2400
> URL: https://issues.apache.org/jira/browse/SOLR-2400
> Project: Solr
> Issue Type: Improvement
> Components: Schema and Analysis
> Reporter: Stefan Matheis (steffkes)
> Priority: Minor
> Attachments: 110303_FieldAnalysisRequestHandler_output.xml,
> 110303_FieldAnalysisRequestHandler_view.png
>
>
> The XML-Output (simplified example attached) is missing one small information
> .. which could be very useful to build an nice Analysis-Output, and that's
> "Token-Relation" (if there is special/correct word for this, please correct
> me).
> Meaning, that is actually not possible to "follow" the Analysis-Process
> (completly) while the Tokenizers/Filters will drop out Tokens (f.e. StopWord)
> or split it into multiple Tokens (f.e. WordDelimiter).
> Would it be possible to include this Information? If so, it would be possible
> to create an improved Analysis-Page for the new Solr Admin (SOLR-2399) -
> short scribble attached
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]