[
https://issues.apache.org/jira/browse/SOLR-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joel Bernstein updated SOLR-10651:
----------------------------------
Description:
This is a ticket for organizing the new statistical programming features of
Streaming Expressions. It's also a place for the community to discuss what
functions are needed to support statistical programming.
Basic Syntax:
{code}
let(a = timeseries(...),
b = timeseries(...),
c = col(a, count(*)),
d = col(b, count(*)),
r = regress(c, d),
tuple(p = predict(r, 50)))
{code}
The expression above is doing the following:
1) The let expression is setting variables (a, b, c, d, r).
2) Variables *a* and *b* are the output of timeseries() Streaming Expressions.
These will be stored in memory as lists of Tuples containing the time series
results.
3) Variables *c* and *d* are set using the *col* evaluator. The col evaluator
extracts a column of numbers from a list of tuples. In the example *col* is
extracting the count\(*\) field from the two time series result sets.
4) Variable *r* is the output from the *regress* evaluator. The regress
evaluator performs a simple regression analysis on two columns of numbers.
5) Once the variables are set, a single Streaming Expression is run by the
*let* expression. In the example the *tuple* expression is run. The tuple
expression outputs a single Tuple with name/value pairs. Any Streaming
Expression can be run by the *let* expression so this can be a complex program.
The streaming expression run by *let* has access to all the variables defined
earlier.
6) The tuple expression in the example has one name / value pair. The name *p*
is set to the output of the *predict* evaluator. The predict evaluator is
predicting the value of a dependent variable based on the independent variable
50. The regression result stored in variable *r* is used to make the prediction.
7) The output of this expression will be a single tuple with the value of the
predict function in the *p* field.
The issues linked to this ticket are the array manipulation and statistical
functions that will form that basis of the stats library. The vast majority of
these functions are backed by algorithms in Apache Commons Math. Other machine
learning and math libraries will follow.
was:
This is a ticket for organizing the new statistical programming features of
Streaming Expressions. It's also a place for the community to discuss what
functions are needed to support statistical programming.
Basic Syntax:
{code}
let(a = timeseries(...),
b = timeseries(...),
c = col(a, count(*)),
d = col(b, count(*)),
r = regress(c, d),
tuple(p = predict(r, 50)))
{code}
The expression above is doing the following:
1) The let expression is setting variables (a, b, c, d, r).
2) Variables *a* and *b* are the output of timeseries() Streaming Expressions.
These will be stored in memory as lists of Tuples containing the time series
results.
3) Variables *c* and *d* are set using the *col* evaluator. The col evaluator
extracts a column of numbers from a list of tuples. In the example *col* is
extracting the count\(*\) field from the two time series result sets.
4) Variable *r* is the output from the *regress* evaluator. The regress
evaluator performs a simple regression analysis on two columns of numbers.
5) Once the variables are set, a single Streaming Expression is run by the
*let* expression. In the example the *tuple* expression is run. The tuple
expression outputs a single Tuple with name/value pairs. Any Streaming
Expression can be run by the *let* expression so this can be a complex program.
The streaming expression run by *let* has access to all the variables defined
earlier.
6) The tuple expression in the example has one name / value pair. The name *p*
is set to the output of the *predict* evaluator. The predict evaluator is
predicting the value of a dependent variable based on the independent variable
50. The regression result stored in variable *r* is used to make the prediction.
7) The output of this expression will be a single tuple with the value of the
predict function in the *p* field.
The issues linked to this are the array manipulation and statistical functions
that will form that basis of the stats library. The vast majority of these
functions are backed by algorithms in Apache Commons Math. Other machine
learning and math libraries will follow.
> Streaming Expressions statistical functions library
> ---------------------------------------------------
>
> Key: SOLR-10651
> URL: https://issues.apache.org/jira/browse/SOLR-10651
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Joel Bernstein
> Fix For: master (7.0)
>
>
> This is a ticket for organizing the new statistical programming features of
> Streaming Expressions. It's also a place for the community to discuss what
> functions are needed to support statistical programming.
> Basic Syntax:
> {code}
> let(a = timeseries(...),
> b = timeseries(...),
> c = col(a, count(*)),
> d = col(b, count(*)),
> r = regress(c, d),
> tuple(p = predict(r, 50)))
> {code}
> The expression above is doing the following:
> 1) The let expression is setting variables (a, b, c, d, r).
> 2) Variables *a* and *b* are the output of timeseries() Streaming
> Expressions. These will be stored in memory as lists of Tuples containing the
> time series results.
> 3) Variables *c* and *d* are set using the *col* evaluator. The col evaluator
> extracts a column of numbers from a list of tuples. In the example *col* is
> extracting the count\(*\) field from the two time series result sets.
> 4) Variable *r* is the output from the *regress* evaluator. The regress
> evaluator performs a simple regression analysis on two columns of numbers.
> 5) Once the variables are set, a single Streaming Expression is run by the
> *let* expression. In the example the *tuple* expression is run. The tuple
> expression outputs a single Tuple with name/value pairs. Any Streaming
> Expression can be run by the *let* expression so this can be a complex
> program. The streaming expression run by *let* has access to all the
> variables defined earlier.
> 6) The tuple expression in the example has one name / value pair. The name
> *p* is set to the output of the *predict* evaluator. The predict evaluator is
> predicting the value of a dependent variable based on the independent
> variable 50. The regression result stored in variable *r* is used to make the
> prediction.
> 7) The output of this expression will be a single tuple with the value of the
> predict function in the *p* field.
> The issues linked to this ticket are the array manipulation and statistical
> functions that will form that basis of the stats library. The vast majority
> of these functions are backed by algorithms in Apache Commons Math. Other
> machine learning and math libraries will follow.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]