[ 
https://issues.apache.org/jira/browse/SOLR-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Egense updated SOLR-8584:
--------------------------------
    Description: 
When you have PhraseFields defined in your edismax parser and make a range 
query, the edismax will expand the query to the relevant fields, but with a 
wrong term added to the query. That term is the max-limit from the range query.

In the example tutorial Solr add the following to the /browse edismax handler 
in solrconfig.xml:
{code}
<str name="pf">name</str> 
{code}

Try this  query  with query explain for the /browse requesthandler:
{code}
apple AND weight:[* TO 60]
{code}
This is from the query explain:
{code}
..... DisjunctionMaxQuery((name:"apple 60").....
{code}
As you can see the term 60 (from the range query) is now added to the search 
term and this will boost the score if indeed "apple 60" matches anything.
To demostrate this, add the following document to the index:
(ipod_video.xml with minor changes)

{code}
<add><doc>
  <field name="id">MA147LL/B</field>
  <field name="name">Apple 120 GB iPod with Video Playback Black</field>
  <field name="manu">Apple Computer Inc.</field>
  <!-- Join -->
  <field name="manu_id_s">apple</field>
  <field name="cat">electronics</field>
  <field name="cat">music</field>
  <field name="features">iTunes, Podcasts, Audiobooks</field>
  <field name="features">Stores up to 15,000 songs, 25,000 photos, or 150 hours 
of video</field>
  <field name="features">2.5-inch, 320x240 color TFT LCD display with LED 
backlight</field>
  <field name="features">Up to 20 hours of battery life</field>
  <field name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, 
H.264 video</field>
  <field name="features">Notes, Calendar, Phone book, Hold button, Date 
display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable 
firmware, USB 2.0 compatibility, Playback speed control, Rechargeable 
capability, Battery level indication</field>
  <field name="includes">earbud headphones, USB cable</field>
  <field name="weight">6.5</field>
  <field name="price">599.00</field>
  <field name="popularity">10</field>
  <field name="inStock">true</field>
  <!-- Dodge City store -->
  <field name="store">37.7752,-100.0232</field>
  <field name="manufacturedate_dt">2005-10-12T08:00:00Z</field>
</doc></add>
{code}

When you repeat the query: 
{code}
apple AND weight:[* TO 60]
{code}
It will find two documents as expected, but the ranking should be
identical! Instead they are   0.65656495 and 0.3007804 

The reason for this bug is that phrase "apple 60" matches one of the documents 
(the one that comes with the tutorial).

The phrase field expansion can go much worse than this and use both the 
start-limit, end-limit and "TO" used in the range query part. 

/Thomas Egense and Toke Eskildsen









  was:
When you have PhraseFields defined in your edismax parser and make a range 
query, the edismax will expand the query to the relevant fields, but with a 
wrong term added to the query. That term is the max-limit from the range query.

In the example tutorial Solr add the following to the /browse edismax handler 
in solrconfig.xml:
{code}
<str name="pf">name</str> 
{code}

Try this  query  with query explain for the /browse requesthandler:
{code}
apple AND weight:[* TO 60]
{code}
This is from the query explain:
{code}
..... DisjunctionMaxQuery((name:"apple 60").....
{code}
As you can see the term 60 (from the range query) is now added to the search 
term and this will boost the score if indeed "apple 60" matches anything.
To demostrate this, add the following document to the index:
(ipod_video.xml with minor changes)

{code}
<add><doc>
  <field name="id">MA147LL/B</field>
  <field name="name">Apple 120 GB iPod with Video Playback Black</field>
  <field name="manu">Apple Computer Inc.</field>
  <!-- Join -->
  <field name="manu_id_s">apple</field>
  <field name="cat">electronics</field>
  <field name="cat">music</field>
  <field name="features">iTunes, Podcasts, Audiobooks</field>
  <field name="features">Stores up to 15,000 songs, 25,000 photos, or 150 hours 
of video</field>
  <field name="features">2.5-inch, 320x240 color TFT LCD display with LED 
backlight</field>
  <field name="features">Up to 20 hours of battery life</field>
  <field name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, 
H.264 video</field>
  <field name="features">Notes, Calendar, Phone book, Hold button, Date 
display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable 
firmware, USB 2.0 compatibility, Playback speed control, Rechargeable 
capability, Battery level indication</field>
  <field name="includes">earbud headphones, USB cable</field>
  <field name="weight">6.5</field>
  <field name="price">599.00</field>
  <field name="popularity">10</field>
  <field name="inStock">true</field>
  <!-- Dodge City store -->
  <field name="store">37.7752,-100.0232</field>
  <field name="manufacturedate_dt">2005-10-12T08:00:00Z</field>
</doc></add>
{code}

When you repeat the query: 
{code}
apple AND weight:[* TO 60]
{code}
It will find two documents as expected, but the ranking should be
identical! Instead they are   0.65656495 and 0.3007804 

The reason for this bug is that phrase "apple 60" matches one of the documents 
(the one that comes with the tutorial).

The phrase field expansion can go much worse than this and use both the 
start-limit, end-limit and "TO" used in the range query part. 

/Thomas Egense










> Phrase Fields ranking bug for range search when using edismax queryparser
> -------------------------------------------------------------------------
>
>                 Key: SOLR-8584
>                 URL: https://issues.apache.org/jira/browse/SOLR-8584
>             Project: Solr
>          Issue Type: Bug
>          Components: query parsers
>    Affects Versions: 4.7, 4.8.1, 4.9.1, 4.10.4, 5.0, 5.1, 5.2, 5.3.1, 5.4
>            Reporter: Thomas Egense
>            Priority: Minor
>              Labels: edismax, phrasequery
>
> When you have PhraseFields defined in your edismax parser and make a range 
> query, the edismax will expand the query to the relevant fields, but with a 
> wrong term added to the query. That term is the max-limit from the range 
> query.
> In the example tutorial Solr add the following to the /browse edismax handler 
> in solrconfig.xml:
> {code}
> <str name="pf">name</str> 
> {code}
> Try this  query  with query explain for the /browse requesthandler:
> {code}
> apple AND weight:[* TO 60]
> {code}
> This is from the query explain:
> {code}
> ..... DisjunctionMaxQuery((name:"apple 60").....
> {code}
> As you can see the term 60 (from the range query) is now added to the search 
> term and this will boost the score if indeed "apple 60" matches anything.
> To demostrate this, add the following document to the index:
> (ipod_video.xml with minor changes)
> {code}
> <add><doc>
>   <field name="id">MA147LL/B</field>
>   <field name="name">Apple 120 GB iPod with Video Playback Black</field>
>   <field name="manu">Apple Computer Inc.</field>
>   <!-- Join -->
>   <field name="manu_id_s">apple</field>
>   <field name="cat">electronics</field>
>   <field name="cat">music</field>
>   <field name="features">iTunes, Podcasts, Audiobooks</field>
>   <field name="features">Stores up to 15,000 songs, 25,000 photos, or 150 
> hours of video</field>
>   <field name="features">2.5-inch, 320x240 color TFT LCD display with LED 
> backlight</field>
>   <field name="features">Up to 20 hours of battery life</field>
>   <field name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, 
> H.264 video</field>
>   <field name="features">Notes, Calendar, Phone book, Hold button, Date 
> display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable 
> firmware, USB 2.0 compatibility, Playback speed control, Rechargeable 
> capability, Battery level indication</field>
>   <field name="includes">earbud headphones, USB cable</field>
>   <field name="weight">6.5</field>
>   <field name="price">599.00</field>
>   <field name="popularity">10</field>
>   <field name="inStock">true</field>
>   <!-- Dodge City store -->
>   <field name="store">37.7752,-100.0232</field>
>   <field name="manufacturedate_dt">2005-10-12T08:00:00Z</field>
> </doc></add>
> {code}
> When you repeat the query: 
> {code}
> apple AND weight:[* TO 60]
> {code}
> It will find two documents as expected, but the ranking should be
> identical! Instead they are   0.65656495 and 0.3007804 
> The reason for this bug is that phrase "apple 60" matches one of the 
> documents (the one that comes with the tutorial).
> The phrase field expansion can go much worse than this and use both the 
> start-limit, end-limit and "TO" used in the range query part. 
> /Thomas Egense and Toke Eskildsen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to