[jira] [Updated] (SOLR-1954) Highlighter component should expose snippet character offsets and the score.

David Smiley (Jira) Tue, 10 Sep 2019 14:13:30 -0700


     [ 
https://issues.apache.org/jira/browse/SOLR-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Smiley updated SOLR-1954:
-------------------------------
    Attachment: SOLR-1954.patch
      Assignee: David Smiley
        Status: Open  (was: Open)

At the Lucene/Solr Hackday event, I worked on this for the Unified highlighter. 
 I'm attaching a patch that is very much WIP but basically works.  It adds a 
"hl.extended" boolean flag which will mean a structured detailed response in 
place of the list of snippets.  TODOs:
* Expose more info; I just did a couple things.
* Probably make the format nicer.  Definitely some rough edges in this code; 
TODOs and WIP bits are there.  Tidying up to do still.
* SolrJ QueryResponse
* Ref guide

> Highlighter component should expose snippet character offsets and the score.
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-1954
>                 URL: https://issues.apache.org/jira/browse/SOLR-1954
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>         Attachments: SOLR-1954.patch, SOLR-1954_start_and_end_offsets.patch
>
>
> The Highlighter Component does not currently expose the snippet character 
> offsets nor the score.  There is a TODO in DefaultSolrHighlighter indicating 
> the intention to add this eventually.  This information is needed when doing 
> highlighting on external content.  The data is there so its pretty easy to 
> output it in some way.  The challenge is deciding on the output and its 
> ramifications on backwards compatibility.  The current highlighter component 
> response structure doesn't lend itself to adding any new data, unfortunately. 
>  I wish the original implementer had some foresight.  Unfortunately all the 
> highlighting tests assume this structure.  Here is a snippet of the current 
> response structure in Solr's sample data searching for "sdram" for reference:
> {code:xml}
> <lst name="highlighting">
>  <lst name="VS1GB400C3">
>   <arr name="text">
>       <str>CORSAIR ValueSelect 1GB 184-Pin DDR &lt;em&gt;SDRAM&lt;/em&gt; 
> Unbuffered DDR 400 (PC 3200) System Memory - Retail</str>
>   </arr>
>  </lst>
> </lst>
> {code}
> Perhaps as a little hack, we introduce a pseudo field called 
> text_startCharOffset which is the concatenation of the matching field and 
> "_startCharOffset".  This would be an array of ints.  Likewise, there would 
> be another array for endCharOffset and score.
> Thoughts?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-1954) Highlighter component should expose snippet character offsets and the score.

Reply via email to