[jira] [Updated] (SOLR-9252) Feature selection and logistic regression on text

Joel Bernstein (JIRA) Thu, 04 Aug 2016 12:08:06 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joel Bernstein updated SOLR-9252:
---------------------------------
    Description: 
This ticket adds two new streaming expressions: *features* and *train*

These two functions work together to train a logistic regression model on text, 
from a training set stored in a SolrCloud collection.

The syntax is as follows:

{code}
train(collection1, q="*:*",
      features(collection1, 
               q="*:*",  
               field="tv_text", 
               outcome="out_i", 
               positiveLabel=1, 
               numTerms=100),
      field="tv_text",
      outcome="out_i",
      maxIterations=100)
{code}

The *features* function extracts the feature terms from a training set using 
*information gain* to score the terms. 
http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf

The *train* function uses the extracted features to train a logistic regression 
model on a text field in the training set.

Both the features and the models can be stored in a SolrCloud collection. Using 
this approach Solr can hold millions of models which can be selectively 
deployed.









  was:
This ticket adds two new streaming expressions *features* and *train*

These two functions work together to train a logistic regression model on text, 
from a training set stored in a SolrCloud collection.

The syntax is as follows:

{code}
train(collection1, q="*:*",
      features(collection1, 
               q="*:*",  
               field="tv_text", 
               outcome="out_i", 
               positiveLabel=1, 
               numTerms=100),
      field="tv_text",
      outcome="out_i",
      maxIterations=100)
{code}

The *features* function extracts the feature terms from a training set using 
*information gain* to score the terms. 
http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf

The *train* function uses the extracted features to train a logistic regression 
model on a text field in the training set.

Both the features and the models can be stored in a SolrCloud collection. Using 
this approach Solr can hold millions of models which can be selectively 
deployed.










> Feature selection and logistic regression on text
> -------------------------------------------------
>
>                 Key: SOLR-9252
>                 URL: https://issues.apache.org/jira/browse/SOLR-9252
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: search, SolrCloud, SolrJ
>            Reporter: Cao Manh Dat
>            Assignee: Joel Bernstein
>              Labels: Streaming
>             Fix For: 6.2
>
>         Attachments: SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, 
> SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, 
> SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch
>
>
> This ticket adds two new streaming expressions: *features* and *train*
> These two functions work together to train a logistic regression model on 
> text, from a training set stored in a SolrCloud collection.
> The syntax is as follows:
> {code}
> train(collection1, q="*:*",
>       features(collection1, 
>                q="*:*",  
>                field="tv_text", 
>                outcome="out_i", 
>                positiveLabel=1, 
>                numTerms=100),
>       field="tv_text",
>       outcome="out_i",
>       maxIterations=100)
> {code}
> The *features* function extracts the feature terms from a training set using 
> *information gain* to score the terms. 
> http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf
> The *train* function uses the extracted features to train a logistic 
> regression model on a text field in the training set.
> Both the features and the models can be stored in a SolrCloud collection. 
> Using this approach Solr can hold millions of models which can be selectively 
> deployed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-9252) Feature selection and logistic regression on text

Reply via email to