[ https://issues.apache.org/jira/browse/SOLR-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024615#comment-13024615 ]
Uwe Schindler commented on SOLR-2400: ------------------------------------- After thinking a little bit more, I think it would even be possible to add this Filter after *each* Step to track tokens. The resulting Attribute would then contain the whole tracking of positions: - After Tokenizer this attribute would contains "0", "1", "2",... - After the first TokenFilter: "0.0", "1.1", "1.2", "1.3", "2.2" (while the second token (1) emitteded by the Tokenizer was split into 3 Tokens). I think this would help? Additionally the Filter could use PositionIncrement to track same position tokens - or this could be left to the consumer (so if 1.2 and 1.3 have posIncr 0, the consumer knows that they all are at same position). If the TokenFilter would use the PosIncr to increment the unique IDs, then this would be solved (so 1.x tokens would always get "1.1" as ID if at same position). I will think about it an supply a patch that enriches the FieldAnalysisContentHandler by this extra attribute. We can then iterate. But today is Easter Holiday, so little bit later... > FieldAnalysisRequestHandler; add information about token-relation > ----------------------------------------------------------------- > > Key: SOLR-2400 > URL: https://issues.apache.org/jira/browse/SOLR-2400 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis > Reporter: Stefan Matheis (steffkes) > Priority: Minor > Attachments: 110303_FieldAnalysisRequestHandler_output.xml, > 110303_FieldAnalysisRequestHandler_view.png > > > The XML-Output (simplified example attached) is missing one small information > .. which could be very useful to build an nice Analysis-Output, and that's > "Token-Relation" (if there is special/correct word for this, please correct > me). > Meaning, that is actually not possible to "follow" the Analysis-Process > (completly) while the Tokenizers/Filters will drop out Tokens (f.e. StopWord) > or split it into multiple Tokens (f.e. WordDelimiter). > Would it be possible to include this Information? If so, it would be possible > to create an improved Analysis-Page for the new Solr Admin (SOLR-2399) - > short scribble attached -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org