[jira] [Commented] (SOLR-2400) FieldAnalysisRequestHandler; add information about token-relation

Uwe Schindler (JIRA) Sun, 24 Apr 2011 04:23:49 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024615#comment-13024615
 ]


Uwe Schindler commented on SOLR-2400:
-------------------------------------

After thinking a little bit more, I think it would even be possible to add this 
Filter after *each* Step to track tokens. The resulting Attribute would then 
contain the whole tracking of positions:
- After Tokenizer this attribute would contains "0", "1", "2",...
- After the first TokenFilter: "0.0", "1.1", "1.2", "1.3", "2.2" (while the 
second token (1) emitteded by the Tokenizer was split into 3 Tokens). I think 
this would help? Additionally the Filter could use PositionIncrement to track 
same position tokens - or this could be left to the consumer (so if 1.2 and 1.3 
have posIncr 0, the consumer knows that they all are at same position). If the 
TokenFilter would use the PosIncr to increment the unique IDs, then this would 
be solved (so 1.x tokens would always get "1.1" as ID if at same position).

I will think about it an supply a patch that enriches the 
FieldAnalysisContentHandler by this extra attribute.

We can then iterate. But today is Easter Holiday, so little bit later...

> FieldAnalysisRequestHandler; add information about token-relation
> -----------------------------------------------------------------
>
>                 Key: SOLR-2400
>                 URL: https://issues.apache.org/jira/browse/SOLR-2400
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Stefan Matheis (steffkes)
>            Priority: Minor
>         Attachments: 110303_FieldAnalysisRequestHandler_output.xml, 
> 110303_FieldAnalysisRequestHandler_view.png
>
>
> The XML-Output (simplified example attached) is missing one small information 
> .. which could be very useful to build an nice Analysis-Output, and that's 
> "Token-Relation" (if there is special/correct word for this, please correct 
> me).
> Meaning, that is actually not possible to "follow" the Analysis-Process 
> (completly) while the Tokenizers/Filters will drop out Tokens (f.e. StopWord) 
> or split it into multiple Tokens (f.e. WordDelimiter).
> Would it be possible to include this Information? If so, it would be possible 
> to create an improved Analysis-Page for the new Solr Admin (SOLR-2399) - 
> short scribble attached

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-2400) FieldAnalysisRequestHandler; add information about token-relation

Reply via email to