[jira] [Updated] (SOLR-9384) Add randomization to the train Streaming Expression to support very large training sets

Joel Bernstein (JIRA) Thu, 04 Aug 2016 09:59:42 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-9384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joel Bernstein updated SOLR-9384:
---------------------------------
    Description: 
The *train* (SOLR-9252) Streaming Expression optimizes a logistic regression 
model on text.

The initial implementation instantiates a doc vector for each document in the 
training set on each iteration. The doc vectors are held in memory so, the size 
of the training set is limited by memory constraints.

This ticket will add randomization to the algorithm so that a random set of 
documents from the training set are processed on each iteration. 

This will allow the train Streaming Expression to be run on much larger 
training sets.

  was:
The *train* Streaming Expression optimizes a logistic regression model on text.

The initial implementation instantiates a doc vector for each document in the 
training set on each iteration. The doc vectors are held in memory so, the size 
of the training set is limited by memory constraints.

This ticket will add randomization to the algorithm so that a random set of 
documents from the training set are processed on each iteration. 

This will allow the train Streaming Expression to be run on much larger 
training sets.


> Add randomization to the train Streaming Expression to support very large 
> training sets
> ---------------------------------------------------------------------------------------
>
>                 Key: SOLR-9384
>                 URL: https://issues.apache.org/jira/browse/SOLR-9384
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Joel Bernstein
>
> The *train* (SOLR-9252) Streaming Expression optimizes a logistic regression 
> model on text.
> The initial implementation instantiates a doc vector for each document in the 
> training set on each iteration. The doc vectors are held in memory so, the 
> size of the training set is limited by memory constraints.
> This ticket will add randomization to the algorithm so that a random set of 
> documents from the training set are processed on each iteration. 
> This will allow the train Streaming Expression to be run on much larger 
> training sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-9384) Add randomization to the train Streaming Expression to support very large training sets

Reply via email to