[jira] Issue Comment Edited: (SOLR-1954) Highlighter component should expose snippet character offsets and the score.

Erik Hatcher (JIRA) Fri, 18 Jun 2010 08:09:52 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880214#action_12880214
 ]


Erik Hatcher edited comment on SOLR-1954 at 6/18/10 11:07 AM:
--------------------------------------------------------------

No, we're not talking about the same thing.   Here's what I'm suggesting:

{code}
{
  'responseHeader'=>{
    'status'=>0,
    'QTime'=>15},
  'response'=>{'numFound'=>3,'start'=>0,'maxScore'=>0.10558263,'docs'=>[
      {
        'id'=>'IW-02',
        'name'=>'iPod & iPod Mini USB 2.0 Cable',
        'manu'=>'Belkin',
        'weight'=>2.0,
        'price'=>11.5,
        'popularity'=>1,
        'inStock'=>false,
        'store_0_d'=>37.7752,
        'store_1_d'=>-122.4232,
        'store'=>'37.7752,-122.4232',
        'manufacturedate_dt'=>'2006-02-14T23:55:59Z',
        'cat'=>[
          'electronics',
          'connector'],
        'features'=>[
          'car power adapter for iPod, white'],
        'score'=>0.10558263}]
  },
  'highlighting'=>{
    'IW-02'=>{
      'features'=>['car power adapter for <em>iPod</em>, white'],
      'name'=>['<em>iPod</em> & <em>iPod</em> Mini USB 2.0 Cable']}},
  'highlighting-extended-info'=>{
    'IW-02'=>{
      'text_startPos'=>[5]
  }
}
{code}

That way the highlighting section remains untouched, with extra stuff in a 
'highlighting-extended-info' (let's use a shorter name though) section as a 
direct child of the root response, just like 'highlighting' is.  


      was (Author: ehatcher):
    No, we're not talking about the same thing.   Here's what I'm suggesting:

{code}
{
  'responseHeader'=>{
    'status'=>0,
    'QTime'=>15},
  'response'=>{'numFound'=>3,'start'=>0,'maxScore'=>0.10558263,'docs'=>[
      {
        'id'=>'IW-02',
        'name'=>'iPod & iPod Mini USB 2.0 Cable',
        'manu'=>'Belkin',
        'weight'=>2.0,
        'price'=>11.5,
        'popularity'=>1,
        'inStock'=>false,
        'store_0_d'=>37.7752,
        'store_1_d'=>-122.4232,
        'store'=>'37.7752,-122.4232',
        'manufacturedate_dt'=>'2006-02-14T23:55:59Z',
        'cat'=>[
          'electronics',
          'connector'],
        'features'=>[
          'car power adapter for iPod, white'],
        'score'=>0.10558263}]
  },
  'facet_counts'=>{
    'facet_queries'=>{},
    'facet_fields'=>{
      'cat'=>[
        'electronics',3,
        'connector',2,
        'music',1],
      'manu_exact'=>[
        'Belkin',2,
        'Apple Computer Inc.',1]},
    'facet_dates'=>{}},
  'highlighting'=>{
    'IW-02'=>{
      'features'=>['car power adapter for <em>iPod</em>, white'],
      'name'=>['<em>iPod</em> & <em>iPod</em> Mini USB 2.0 Cable']}},
  'highlighting-extended-info'=>{
    'IW-02'=>{
      'text_startPos'=>[5]
  },
  'spellcheck'=>{
    'suggestions'=>[]}}
{code}

That way the highlighting section remains untouched, with extra stuff in a 
'highlighting-extended-info' (let's use a shorter name though) section as a 
direct child of the root response, just like 'highlighting' is.  

  
> Highlighter component should expose snippet character offsets and the score.
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-1954
>                 URL: https://issues.apache.org/jira/browse/SOLR-1954
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>            Reporter: David Smiley
>            Priority: Minor
>         Attachments: SOLR-1954_start_and_end_offsets.patch
>
>
> The Highlighter Component does not currently expose the snippet character 
> offsets nor the score.  There is a TODO in DefaultSolrHighlighter indicating 
> the intention to add this eventually.  This information is needed when doing 
> highlighting on external content.  The data is there so its pretty easy to 
> output it in some way.  The challenge is deciding on the output and its 
> ramifications on backwards compatibility.  The current highlighter component 
> response structure doesn't lend itself to adding any new data, unfortunately. 
>  I wish the original implementer had some foresight.  Unfortunately all the 
> highlighting tests assume this structure.  Here is a snippet of the current 
> response structure in Solr's sample data searching for "sdram" for reference:
> {code:xml}
> <lst name="highlighting">
>  <lst name="VS1GB400C3">
>   <arr name="text">
>       <str>CORSAIR ValueSelect 1GB 184-Pin DDR &lt;em&gt;SDRAM&lt;/em&gt; 
> Unbuffered DDR 400 (PC 3200) System Memory - Retail</str>
>   </arr>
>  </lst>
> </lst>
> {code}
> Perhaps as a little hack, we introduce a pseudo field called 
> text_startCharOffset which is the concatenation of the matching field and 
> "_startCharOffset".  This would be an array of ints.  Likewise, there would 
> be another array for endCharOffset and score.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Issue Comment Edited: (SOLR-1954) Highlighter component should expose snippet character offsets and the score.

Reply via email to