[jira] [Commented] (SOLR-7560) Parallel SQL Support

Dennis Gove (JIRA) Tue, 09 Jun 2015 04:31:14 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578750#comment-14578750
 ]


Dennis Gove commented on SOLR-7560:
-----------------------------------

Possible expression syntax for the RollupStream

{code}
rollup(
  someStream(....),
  over="fieldA, fieldB, fieldC",
  min(fieldA),
  max(fieldA),
  min(fieldB),
  mean(fieldD),
  sum(fieldC)
)
{code}

This would require making the *Metric types Expressible but I think that ends 
up as a good thing. Would make it real easy to support other options on metrics 
like excluding outliers, for example find the sum of values within 3 standard 
deviations from the mean could be 
{code}
sum(fieldC, limit=standardDev(3))
{code}
 (note, how that particular calculation could be implemented is left as an 
exercise for the reader, I'm just using it as an example of adding additional 
options on a relatively simple metric).
Another option example is what to do with null values. For example, in some 
cases a null should not impact a mean but in others it should. You could 
express those as
{code}
mean(fieldA, replace(null, 0))  // replace null values with 0 thus leading to 
an impact on the mean
mean(fieldA, includeNull="true") // nulls are counted in the denominator but 
nothing added to numerator
mean(fieldA, includeNull="false") // nulls neither counted in denominator nor 
added to numerator
mean(fieldA, replace(null, fieldB), includeNull="true") // if fieldA is null 
replace it with fieldB, include null fieldB in mean
{code}
so on and so forth.

> Parallel SQL Support
> --------------------
>
>                 Key: SOLR-7560
>                 URL: https://issues.apache.org/jira/browse/SOLR-7560
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, search
>            Reporter: Joel Bernstein
>             Fix For: 5.3
>
>         Attachments: SOLR-7560.patch
>
>
> This ticket provides support for executing *Parallel SQL* queries across 
> SolrCloud collections. The SQL engine will be built on top of the Streaming 
> API (SOLR-7082), which provides support for *parallel relational algebra* and 
> *real-time map-reduce*.
> Basic design:
> 1) A new SQLHandler will be added to process SQL requests. The SQL statements 
> will be compiled to live Streaming API objects for parallel execution across 
> SolrCloud worker nodes.
> 2) SolrCloud collections will be abstracted as *Relational Tables*. 
> 3) The Presto SQL parser will be used to parse the SQL statements.
> 4) A JDBC thin client will be added as a Solrj client.
> This ticket will focus on putting the framework in place and providing basic 
> SELECT support and GROUP BY aggregate support.
> Future releases will build on this framework to provide additional SQL 
> features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-7560) Parallel SQL Support

Reply via email to