[ https://issues.apache.org/jira/browse/SOLR-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joel Bernstein updated SOLR-9252: --------------------------------- Description: This ticket adds two new streaming expressions: *features* and *train* These two functions work together to train a logistic regression model on text, from a training set stored in a SolrCloud collection. The syntax is as follows: {code} train(collection1, q="*:*", features(collection1, q="*:*", field="tv_text", outcome="out_i", positiveLabel=1, numTerms=100), field="tv_text", outcome="out_i", maxIterations=100) {code} The *features* function extracts the feature terms from a training set using *information gain* to score the terms. http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf The *train* function uses the extracted features to train a logistic regression model on a text field in the training set. Both the features and the models can be stored in a SolrCloud collection. Using this approach Solr can hold millions of models which can be selectively deployed. was: This ticket adds two new streaming expressions *features* and *train* These two functions work together to train a logistic regression model on text, from a training set stored in a SolrCloud collection. The syntax is as follows: {code} train(collection1, q="*:*", features(collection1, q="*:*", field="tv_text", outcome="out_i", positiveLabel=1, numTerms=100), field="tv_text", outcome="out_i", maxIterations=100) {code} The *features* function extracts the feature terms from a training set using *information gain* to score the terms. http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf The *train* function uses the extracted features to train a logistic regression model on a text field in the training set. Both the features and the models can be stored in a SolrCloud collection. Using this approach Solr can hold millions of models which can be selectively deployed. > Feature selection and logistic regression on text > ------------------------------------------------- > > Key: SOLR-9252 > URL: https://issues.apache.org/jira/browse/SOLR-9252 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search, SolrCloud, SolrJ > Reporter: Cao Manh Dat > Assignee: Joel Bernstein > Labels: Streaming > Fix For: 6.2 > > Attachments: SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, > SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, > SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch > > > This ticket adds two new streaming expressions: *features* and *train* > These two functions work together to train a logistic regression model on > text, from a training set stored in a SolrCloud collection. > The syntax is as follows: > {code} > train(collection1, q="*:*", > features(collection1, > q="*:*", > field="tv_text", > outcome="out_i", > positiveLabel=1, > numTerms=100), > field="tv_text", > outcome="out_i", > maxIterations=100) > {code} > The *features* function extracts the feature terms from a training set using > *information gain* to score the terms. > http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf > The *train* function uses the extracted features to train a logistic > regression model on a text field in the training set. > Both the features and the models can be stored in a SolrCloud collection. > Using this approach Solr can hold millions of models which can be selectively > deployed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org