[
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Trey Grainger updated SOLR-9418:
--------------------------------
Attachment: SOLR-9418.patch
> Statistical Phrase Identifier
> -----------------------------
>
> Key: SOLR-9418
> URL: https://issues.apache.org/jira/browse/SOLR-9418
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Akash Mehta
> Priority: Major
> Attachments: SOLR-9418.patch, SOLR-9418.zip
>
>
> The Statistical Phrase Identifier is a Solr contribution that takes in a
> string of text and then leverages a language model (an Apache Lucene/Solr
> inverted index) to predict how the inputted text should be divided into
> phrases. The intended purpose of this tool is to parse short-text queries
> into phrases prior to executing a keyword search (as opposed parsing out each
> keyword as a single term).
> History
> This project was originally implemented at CareerBuilder in the summer of
> 2015 for use as part of their semantic search system. In 2018
>
> The main aim of this requestHandler is to get the best parsing for a given
> query. This basically means recognizing different phrases within the query.
> We need some kind of training data to generate these phrases. The way this
> project works is:
> 1.)Generate all possible parsings for the given query
> 2.)For each possible parsing, a naive-bayes like score is calculated.
> 3.)The main scoring is done by going through all the documents in the
> training set and finding the probability of bunch of words occurring together
> as a phrase as compared to them occurring randomly in the same document. Then
> the score is normalized. Some higher importance is given to the title field
> as compared to content field which is configurable.
> 4.)Finally after scoring each of the possible parsing, the one with the
> highest score is returned.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]