[ 
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591921#comment-15591921
 ] 

Doug Turnbull commented on SOLR-9418:
-------------------------------------

Looking at your patch (I'm not a committer just curious about the patch). A few 
things jump out in a shallow reading that would probably need to change for 
this to be accepted:

- Field names and thresholds likely need to be configurable, as most folks 
won't nescesarilly have a field named exactly "title" or "content." 
- Can this be a qparser plugin instead of a request handler? It's likely I'd 
want to use it alongside other qparsers and SearchComponents (like highlighting 
or facets).
- Can you provide some documentation on how the thresholds work/can be 
configured?

> Probabilistic-Query-Parser RequestHandler
> -----------------------------------------
>
>                 Key: SOLR-9418
>                 URL: https://issues.apache.org/jira/browse/SOLR-9418
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Akash Mehta
>         Attachments: SOLR-9418.zip
>
>
> The main aim of this requestHandler is to get the best parsing for a given 
> query. This basically means recognizing different phrases within the query. 
> We need some kind of training data to generate these phrases. The way this 
> project works is:
> 1.)Generate all possible parsings for the given query
> 2.)For each possible parsing, a naive-bayes like score is calculated.
> 3.)The main scoring is done by going through all the documents in the 
> training set and finding the probability of bunch of words occurring together 
> as a phrase as compared to them occurring randomly in the same document. Then 
> the score is normalized. Some higher importance is given to the title field 
> as compared to content field which is configurable.
> 4.)Finally after scoring each of the possible parsing, the one with the 
> highest score is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to