[jira] Commented: (SOLR-1954) Highlighter component should expose snippet character offsets and the score.

David Smiley (JIRA) Thu, 17 Jun 2010 20:57:52 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880063#action_12880063
 ]


David Smiley commented on SOLR-1954:
------------------------------------

I think a different component is overkill for such a small change. I agree it 
should be toggled with a parameter.

Since the majority of users don't care about this extra metadata, perhaps the 
existing structure should be retained for when it is not asked for.  Nobody 
would have to change, not even the tests.  No Solr clients would have to care, 
necessarily.  And when it is asked for (a rare need), the structure would then 
change to accommodate it.  This would brake any client logic expecting to find 
the existing snippet where it usually is because it wouldn't be there.  If this 
is undesirable or unacceptable, then there's the field suffix method that I 
describe in this issue and which is implemented in the patch.  The only danger 
is that a client should not assume that that all listed fields are arrays of 
strings since some will be arrays of ints or floats.  My patch includes such a 
modification for SolrJ.

> Highlighter component should expose snippet character offsets and the score.
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-1954
>                 URL: https://issues.apache.org/jira/browse/SOLR-1954
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>            Reporter: David Smiley
>            Priority: Minor
>         Attachments: SOLR-1954_start_and_end_offsets.patch
>
>
> The Highlighter Component does not currently expose the snippet character 
> offsets nor the score.  There is a TODO in DefaultSolrHighlighter indicating 
> the intention to add this eventually.  This information is needed when doing 
> highlighting on external content.  The data is there so its pretty easy to 
> output it in some way.  The challenge is deciding on the output and its 
> ramifications on backwards compatibility.  The current highlighter component 
> response structure doesn't lend itself to adding any new data, unfortunately. 
>  I wish the original implementer had some foresight.  Unfortunately all the 
> highlighting tests assume this structure.  Here is a snippet of the current 
> response structure in Solr's sample data searching for "sdram" for reference:
> {code:xml}
> <lst name="highlighting">
>  <lst name="VS1GB400C3">
>   <arr name="text">
>       <str>CORSAIR ValueSelect 1GB 184-Pin DDR &lt;em&gt;SDRAM&lt;/em&gt; 
> Unbuffered DDR 400 (PC 3200) System Memory - Retail</str>
>   </arr>
>  </lst>
> </lst>
> {code}
> Perhaps as a little hack, we introduce a pseudo field called 
> text_startCharOffset which is the concatenation of the matching field and 
> "_startCharOffset".  This would be an array of ints.  Likewise, there would 
> be another array for endCharOffset and score.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (SOLR-1954) Highlighter component should expose snippet character offsets and the score.

Reply via email to