Joel Bernstein created SOLR-9384:
------------------------------------
Summary: Add randomization to the train Streaming Expression to
support very large training sets
Key: SOLR-9384
URL: https://issues.apache.org/jira/browse/SOLR-9384
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Joel Bernstein
The *train* Streaming Expression optimizes a logistic regression model on text.
The initial implementation instantiates a doc vector for each document in the
training set on each iteration. The doc vectors are held in memory so, the size
of the training set is limited by memory constraints.
This ticket will add randomization to the algorithm so that a random set of
documents from the training set are processed on each iteration.
This will allow the train Streaming Expression to be run on much larger
training sets.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]