[
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591921#comment-15591921
]
Doug Turnbull commented on SOLR-9418:
-------------------------------------
Looking at your patch (I'm not a committer just curious about the patch). A few
things jump out in a shallow reading that would probably need to change for
this to be accepted:
- Field names and thresholds likely need to be configurable, as most folks
won't nescesarilly have a field named exactly "title" or "content."
- Can this be a qparser plugin instead of a request handler? It's likely I'd
want to use it alongside other qparsers and SearchComponents (like highlighting
or facets).
- Can you provide some documentation on how the thresholds work/can be
configured?
> Probabilistic-Query-Parser RequestHandler
> -----------------------------------------
>
> Key: SOLR-9418
> URL: https://issues.apache.org/jira/browse/SOLR-9418
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Akash Mehta
> Attachments: SOLR-9418.zip
>
>
> The main aim of this requestHandler is to get the best parsing for a given
> query. This basically means recognizing different phrases within the query.
> We need some kind of training data to generate these phrases. The way this
> project works is:
> 1.)Generate all possible parsings for the given query
> 2.)For each possible parsing, a naive-bayes like score is calculated.
> 3.)The main scoring is done by going through all the documents in the
> training set and finding the probability of bunch of words occurring together
> as a phrase as compared to them occurring randomly in the same document. Then
> the score is normalized. Some higher importance is given to the title field
> as compared to content field which is configurable.
> 4.)Finally after scoring each of the possible parsing, the one with the
> highest score is returned.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]