Timothy Potter created SOLR-10096:
-------------------------------------
Summary: Parallel SQL not returning the correct results for count
distinct aggregation
Key: SOLR-10096
URL: https://issues.apache.org/jira/browse/SOLR-10096
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Components: Parallel SQL
Affects Versions: 6.3
Reporter: Timothy Potter
While trying to compare the results of this perf exp. blogged here:
https://www.periscopedata.com/blog/use-subqueries-to-count-distinct-50x-faster.html
I tried this with 6.3 and it only returns the dashboard_id's:
{code}
select dashboard_id, count(distinct user_id) as ct from time_on_site_logs group
by dashboard_id
{code}
Results: (where is ct?)
{code}
{"result-set":{"docs":[
{"dashboard_id":0},
{"dashboard_id":2},
{"dashboard_id":5},
{"dashboard_id":6},
{"dashboard_id":8},
{"dashboard_id":10},
{"dashboard_id":12},
{"dashboard_id":13},
{"dashboard_id":14},
{"dashboard_id":15},
...
{code}
So I dropped the alias for {{count(distinct user_id)}} and got the wrong
results (i.e. it's not applying the distinct for user_id's):
{code}
{"result-set":{"docs":[
{"count(*)":8288,"dashboard_id":0},
{"count(*)":7392,"dashboard_id":2},
{"count(*)":23800,"dashboard_id":5},
{"count(*)":25032,"dashboard_id":6},
{"count(*)":8960,"dashboard_id":8},
{"count(*)":7840,"dashboard_id":10},
{"count(*)":17192,"dashboard_id":12},
{code}
So I'm guessing this isn't a supported syntax yet?
Also, probably a different issue, but I then tried this query:
{code}
select distinct dashboard_id, user_id from time_on_site_logs
{code}
and it took ~70 secs (m3.xlarge), which is very surprising because this
streaming expression (which basically does the initial query above):
{code}
select(rollup(facet(time_on_site_logs, q="*:*", buckets="dashboard_id,user_id",
bucketSizeLimit=2000, bucketSorts="dashboard_id asc",count(*)),
over="dashboard_id", count(*)), dashboard_id,count(*) as unique_users)
{code}
Returns in ~1.5 secs ... that's a huge difference in performance.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]