Lakshmi Venkataswamy created SOLR-4824:
------------------------------------------
Summary: Faceting results are changed after ingestion of documents
past a certain number
Key: SOLR-4824
URL: https://issues.apache.org/jira/browse/SOLR-4824
Project: Solr
Issue Type: Bug
Affects Versions: 4.3, 4.2
Environment: Ubuntu 12.04 LTS 12.04.2
jre1.7.0_17
jboss-as-7.1.1.Final
Reporter: Lakshmi Venkataswamy
In upgrading from SOLR 3.6 to 4.2/4.3 I and comparing results on fuzzy queries,
I found that after a certain number of documents were ingested the fuzzy query
has drastically lower number of results. We have approximately 18,000
documents per day and after ingesting approximately 40 days of documents, the
next incremental day of documents results in a lower number of results of a
fuzzy search.
The query :
http://10.100.1.48:8080/solr/coreTV3/select?q=cc:worde~1&facet=on&facet.field=date&fl=date&facet.sort
produces the following result before the threshold is crossed
<response><lst name="responseHeader">
<int name="status">0</int><int name="QTime">2349</int><lst name="params"><str
name="facet">on</str><str name="fl">date</str><str name="facet.sort"/>
<str name="q">cc:worde~1</str><str
name="facet.field">date</str></lst></lst><result name="response"
numFound="362803" start="0"></result>
<lst name="facet_counts"><lst name="facet_queries"/><lst
name="facet_fields"><lst name="date">
<int name="2012-12-31">2866</int>
<int name="2013-01-01">11372</int>
<int name="2013-01-02">11514</int>
<int name="2013-01-03">12015</int>
<int name="2013-01-04">11746</int>
<int name="2013-01-05">10853</int>
<int name="2013-01-06">11053</int>
<int name="2013-01-07">11815</int>
<int name="2013-01-08">11427</int>
<int name="2013-01-09">11475</int>
<int name="2013-01-10">11461</int>
<int name="2013-01-11">12058</int>
<int name="2013-01-12">11335</int>
<int name="2013-01-13">12039</int>
<int name="2013-01-14">12064</int>
<int name="2013-01-15">12234</int>
<int name="2013-01-16">12545</int>
<int name="2013-01-17">11766</int>
<int name="2013-01-18">12197</int>
<int name="2013-01-19">11414</int>
<int name="2013-01-20">11633</int>
<int name="2013-01-21">12863</int>
<int name="2013-01-22">12378</int>
<int name="2013-01-23">11947</int>
<int name="2013-01-24">11822</int>
<int name="2013-01-25">11882</int>
<int name="2013-01-26">10474</int>
<int name="2013-01-27">11051</int>
<int name="2013-01-28">11776</int>
<int name="2013-01-29">11957</int>
<int name="2013-01-30">11260</int>
<int name="2013-01-31">8511</int>
</lst></lst><lst name="facet_dates"/><lst name="facet_ranges"/></lst></response>
Once the 40 days of documents ingested threshold is crossed the results drop as
show below for the same query
<response><lst name="responseHeader">
<int name="status">0</int><int name="QTime">2</int><lst name="params"><str
name="facet">on</str><str name="fl">date</str><str name="facet.sort"/><str
name="q">cc:worde~1</str><str name="facet.field">date</str></lst></lst>
<result name="response" numFound="1338" start="0"></result>
<lst name="facet_counts"><lst name="facet_queries"/><lst
name="facet_fields"><lst name="date">
<int name="2012-12-31">0</int>
<int name="2013-01-01">41</int>
<int name="2013-01-02">21</int>
<int name="2013-01-03">24</int>
<int name="2013-01-04">19</int>
<int name="2013-01-05">9</int>
<int name="2013-01-06">11</int>
<int name="2013-01-07">17</int>
<int name="2013-01-08">14</int>
<int name="2013-01-09">24</int>
<int name="2013-01-10">43</int>
<int name="2013-01-11">14</int>
<int name="2013-01-12">52</int>
<int name="2013-01-13">57</int>
<int name="2013-01-14">25</int>
<int name="2013-01-15">17</int>
<int name="2013-01-16">34</int>
<int name="2013-01-17">11</int>
<int name="2013-01-18">16</int>
<int name="2013-01-19">121</int>
<int name="2013-01-20">33</int>
<int name="2013-01-21">26</int>
<int name="2013-01-22">59</int>
<int name="2013-01-23">27</int>
<int name="2013-01-24">10</int>
<int name="2013-01-25">9</int>
<int name="2013-01-26">6</int>
<int name="2013-01-27">16</int>
<int name="2013-01-28">11</int>
<int name="2013-01-29">15</int>
<int name="2013-01-30">21</int>
<int name="2013-01-31">109</int>
<int name="2013-02-01">11</int>
<int name="2013-02-02">7</int>
<int name="2013-02-03">10</int>
<int name="2013-02-04">8</int>
<int name="2013-02-05">13</int>
<int name="2013-02-06">75</int>
<int name="2013-02-07">77</int>
<int name="2013-02-08">31</int>
<int name="2013-02-09">35</int>
<int name="2013-02-10">22</int>
<int name="2013-02-11">18</int>
<int name="2013-02-12">11</int>
<int name="2013-02-13">68</int>
<int name="2013-02-14">40</int>
</lst></lst><lst name="facet_dates"/><lst name="facet_ranges"/></lst></response>
I have also tested this with different months of data and have seen the same
issue around the number of documents.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]