[
https://issues.apache.org/jira/browse/SOLR-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tomás Fernández Löbbe updated SOLR-6216:
----------------------------------------
Attachment: FacetTester.java
[~dsmiley] I used the attached Java class for running the queries. As I said,
the dataset was geonames, added 4 times (with different IDs) so that the index
had 33M docs total.
The queries are all boolean queries with two OR terms, generated by taking
terms from the “name” field of the dataset. An example:
{noformat}
name:cemetery name:lake
name:el name:historical
name:church name:el
name:dam name:la
name:al name:church
name:al name:creek
name:baptist name:la
name:la name:mount
name:creek name:de
name:center name:park
name:church name:creek
...
{noformat}
Eyeballing the logs, most of those queries matched a high number of docs from
the index. In addition, I had a bash script running to add documents every one
second:
{noformat}
#!/bin/bash
IFS='\n'
while read q
do
echo $q > tmp.doc
curl -v
'http://localhost:8983/solr/geonames/update?stream.file=/absolute/path/to/tmp.doc&stream.contentType=text/csv;charset=utf-8&separator=%09&encapsulator=%0E&header=false&fieldnames=id,name,,alternatenames,latitude,longitude,feature_class,feature_code,country_code,cc2,admin1_code,admin2_code,admin3_code,admin4_code,population,elevation,dem,timezone,modification_date&f.alternatenames.split=true&f.alternatenames.separator=,&f.alternatenames.encapsulator=%0E&f.cc2.split=true&f.cc2.separator=,&f.cc2.encapsulator=%0E'
sleep 1
done < allCountries.txt
{noformat}
Unfortunately, It looks like I deleted the schema file I used, however there
was nothing crazy about it, population is an int field with docValues=true.
autoSoftCommit is configured to run every second.
For the second test, I can’t upload the code because it’s full of customer
specific data, but the test is very similar. I took some production queries,
which had “intervals” in 6 fields, around 40 “intervals” total (originally
using facet queries for each of them). For that test I used a similar bash
script to upload data every second too.
I have been testing this code in an environment mirroring production for around
2/3 weeks now and QTimes have improve dramatically (on a multi-shard
collection). I haven’t seen errors related to this.
> Better faceting for multiple intervals on DV fields
> ---------------------------------------------------
>
> Key: SOLR-6216
> URL: https://issues.apache.org/jira/browse/SOLR-6216
> Project: Solr
> Issue Type: Improvement
> Reporter: Tomás Fernández Löbbe
> Assignee: Erick Erickson
> Fix For: 4.10
>
> Attachments: FacetTester.java, SOLR-6216.patch, SOLR-6216.patch,
> SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch,
> SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch
>
>
> There are two ways to have faceting on values ranges in Solr right now:
> “Range Faceting” and “Query Faceting” (doing range queries). They both end up
> doing something similar:
> {code:java}
> searcher.numDocs(rangeQ , docs)
> {code}
> The good thing about this implementation is that it can benefit from caching.
> The bad thing is that it may be slow with cold caches, and that there will be
> a query for each of the ranges.
> A different implementation would be one that works similar to regular field
> faceting, using doc values and validating ranges for each value of the
> matching documents. This implementation would sometimes be faster than Range
> Faceting / Query Faceting, specially on cases where caches are not very
> effective, like on a high update rate, or where ranges change frequently.
> Functionally, the result should be exactly the same as the one obtained by
> doing a facet query for every interval
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]