[jira] [Commented] (SOLR-8492) Add LogisticRegressionQuery and LogitStream

Cao Manh Dat (JIRA) Sun, 10 Jan 2016 15:41:01 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091290#comment-15091290
 ]


Cao Manh Dat commented on SOLR-8492:
------------------------------------

He still use Gradient Descent
{code}
Gradient[i] = alpha*(sigmoid-outcome) * xi
{code}
We move forward to decrease direction of gradient. We stake steps by moving 
proportional to the negative of the gradient. Which mean
{code}
W[i] = W[i] - Gradient[i]
{code}
In the link you provided he simply use
{code}
-Gradient[i] = alpha*(outcome-sigmoid) * xi
So W[i] = W[i] - Gradient[i] = W[i] + alpha * (outcome - sigmoid) * xi
{code}

> Add LogisticRegressionQuery and LogitStream
> -------------------------------------------
>
>                 Key: SOLR-8492
>                 URL: https://issues.apache.org/jira/browse/SOLR-8492
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Joel Bernstein
>         Attachments: SOLR-8492.patch, SOLR-8492.patch, SOLR-8492.patch, 
> SOLR-8492.patch
>
>
> This ticket is to add a new query called a LogisticRegressionQuery (LRQ).
> The LRQ extends AnalyticsQuery 
> (http://joelsolr.blogspot.com/2015/12/understanding-solrs-analyticsquery.html)
>  and returns a DelegatingCollector that implements a Stochastic Gradient 
> Descent (SGD) optimizer for Logistic Regression.
> This ticket also adds the LogitStream which leverages Streaming Expressions 
> to provide iteration over the shards. Each call to LogitStream.read() calls 
> down to the shards and executes the LogisticRegressionQuery. The model data 
> is collected from the shards and the weights are averaged and sent back to 
> the shards with the next iteration. Each call to read() returns a Tuple with 
> the averaged weights and error from the shards. With this approach the 
> LogitStream streams the changing model back to the client after each 
> iteration.
> The LogitStream will return the EOF Tuple when it reaches the defined 
> maxIterations. When sent as a Streaming Expression to the Stream handler this 
> provides parallel iterative behavior. This same approach can be used to 
> implement other parallel iterative algorithms.
> The initial patch has  a test which simply tests the mechanics of the 
> iteration. More work will need to be done to ensure the SGD is properly 
> implemented. The distributed approach of the SGD will also need to be 
> reviewed.  
> This implementation is designed for use cases with a small number of features 
> because each feature is it's own discreet field.
> An implementation which supports a higher number of features would be 
> possible by packing features into a byte array and storing as binary 
> DocValues.
> This implementation is designed to support a large sample set. With a large 
> number of shards, a sample set into the billions may be possible.
> sample Streaming Expression Syntax:
> {code}
> logit(collection1, features="a,b,c,d,e,f" outcome="x" maxIterations="80")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-8492) Add LogisticRegressionQuery and LogitStream

Reply via email to