[ 
https://issues.apache.org/jira/browse/SOLR-12197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-12197:
----------------------------------
    Description: 
Currently the *train* Streaming Expression trains a logistic regression model 
by iterating over the entire distributed training set on each training 
iteration. Each training iteration involves building a matrix on each shard 
with the number of rows equal to the size of the training set contained on the 
shard. The number of columns will be the number of features. This scenario can 
create very large matrices when working with large training sets and feature 
sets.

This ticket will add a *sample* parameter which will limit the size of the 
training set on each iteration to a random sample of the training set. This 
will allow for much larger training sets.

  was:
Currently the *train* Streaming Expression trains a logistic regression model 
by iterating over the entire distributed training set on each pass. Each 
iteration involves building a matrix on each shard with the number of rows 
equal to the size of the training set contained on the shard. The number of 
columns will be the number of features. This scenario can create very large 
matrices when working with large training sets and feature sets.

This ticket will add a *sample* parameter which will limit the size of the 
training set on each iteration to a random sample of the training set. This 
will allow for much larger training sets.


> Implement sampling for logistic regression classifier
> -----------------------------------------------------
>
>                 Key: SOLR-12197
>                 URL: https://issues.apache.org/jira/browse/SOLR-12197
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: streaming expressions
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>            Priority: Major
>             Fix For: 7.4
>
>
> Currently the *train* Streaming Expression trains a logistic regression model 
> by iterating over the entire distributed training set on each training 
> iteration. Each training iteration involves building a matrix on each shard 
> with the number of rows equal to the size of the training set contained on 
> the shard. The number of columns will be the number of features. This 
> scenario can create very large matrices when working with large training sets 
> and feature sets.
> This ticket will add a *sample* parameter which will limit the size of the 
> training set on each iteration to a random sample of the training set. This 
> will allow for much larger training sets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to