[
https://issues.apache.org/jira/browse/SOLR-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joel Bernstein updated SOLR-9252:
---------------------------------
Description:
This ticket adds two new streaming expressions: *features* and *train*
These two functions work together to train a logistic regression model on text,
from a training set stored in a SolrCloud collection.
The syntax is as follows:
{code}
train(collection1, q="*:*",
features(collection1,
q="*:*",
field="tv_text",
outcome="out_i",
positiveLabel=1,
numTerms=100),
field="tv_text",
outcome="out_i",
maxIterations=100)
{code}
The *features* function extracts the feature terms from a training set using
*information gain* to score the terms.
http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf
The *train* function uses the extracted features to train a logistic regression
model on a text field in the training set.
Both the features and the models can be stored in a SolrCloud collection. Using
this approach Solr can hold millions of models which can be selectively
deployed.
was:
This ticket adds two new streaming expressions *features* and *train*
These two functions work together to train a logistic regression model on text,
from a training set stored in a SolrCloud collection.
The syntax is as follows:
{code}
train(collection1, q="*:*",
features(collection1,
q="*:*",
field="tv_text",
outcome="out_i",
positiveLabel=1,
numTerms=100),
field="tv_text",
outcome="out_i",
maxIterations=100)
{code}
The *features* function extracts the feature terms from a training set using
*information gain* to score the terms.
http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf
The *train* function uses the extracted features to train a logistic regression
model on a text field in the training set.
Both the features and the models can be stored in a SolrCloud collection. Using
this approach Solr can hold millions of models which can be selectively
deployed.
> Feature selection and logistic regression on text
> -------------------------------------------------
>
> Key: SOLR-9252
> URL: https://issues.apache.org/jira/browse/SOLR-9252
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: search, SolrCloud, SolrJ
> Reporter: Cao Manh Dat
> Assignee: Joel Bernstein
> Labels: Streaming
> Fix For: 6.2
>
> Attachments: SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch,
> SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch,
> SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch
>
>
> This ticket adds two new streaming expressions: *features* and *train*
> These two functions work together to train a logistic regression model on
> text, from a training set stored in a SolrCloud collection.
> The syntax is as follows:
> {code}
> train(collection1, q="*:*",
> features(collection1,
> q="*:*",
> field="tv_text",
> outcome="out_i",
> positiveLabel=1,
> numTerms=100),
> field="tv_text",
> outcome="out_i",
> maxIterations=100)
> {code}
> The *features* function extracts the feature terms from a training set using
> *information gain* to score the terms.
> http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf
> The *train* function uses the extracted features to train a logistic
> regression model on a text field in the training set.
> Both the features and the models can be stored in a SolrCloud collection.
> Using this approach Solr can hold millions of models which can be selectively
> deployed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]