Hi Ali, I agree with the others that there is no good way to do what you are looking for if you want to assign lucene-like scores to your external results, but if you have some objective measure of goodness that doesn't depend on your lucene scores, you can apply it to both result sets and merge them that way.
One such measure could probably be the number of words in your query that you found in your title, or if you want to take the title length into consideration, the Jaccard similarity between the query words and title words. I once solved a slightly different (but related) problem using a somewhat different approach - mentioning it here in case it gives you some ideas. In my previous job we would "concept map" documents using our ontology - so each document could be thought of as a (weighted) bag of concepts - our concept search involved querying this bag of concepts. The indexing process was expensive, and we had just migrated to a new Java based annotation pipeline which assigned very different concept scores to documents, but which were "intuitively more correct". However, whereas the old system assigned concept scores typically in the 20,000 range, our new system assigned scores to similar documents in the 100 range. We also had a set of huge indexes we had crawled with the old pipeline that would take us weeks/months to get done with the new pipeline, so we decided to merge results from our old index and newly crawled content (much smaller set) for a client. So I calculated the z-score (across all concepts) for both content sets and used that to rescale the concept scores of the old set to the new set. Although the underlying math was a bit sketchy, the merged results looked quite good. Hope this helps, -sujit On Fri, Apr 10, 2015 at 2:32 PM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > There is doc for tf*idf scoring in the javadoc: > > http://lucene.apache.org/core/5_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html > > The IndexSearcher#explain method returns an Explanation structure which > details the scoring for a document: > > http://lucene.apache.org/core/5_0_0/core/org/apache/lucene/search/IndexSearcher.html#explain(org.apache.lucene.search.Query > , > int) > > -- Jack Krupansky > > On Fri, Apr 10, 2015 at 4:15 PM, Gregory Dearing <gregdear...@gmail.com> > wrote: > > > Hi Ali, > > > > The short answer to your question is... there's no good way to create a > > score from your result string, without using the Lucene index, that will > be > > directly comparable to the Lucene score. The reason is that the score > > isn't just a function of the query and the contents of the document. > It's > > also (usually) a function of the contents of the entire corpus... or > rather > > how common terms are across the entire corpus. > > > > That being said... the default scoring algorithm is based on tf/idf. The > > implementation isn't in any one class... every query type (e.g. Term > Query, > > Boolean Query, etc...) contains its own code for calculating scores. So > > the complete scoring formula will depend on the type of queries you're > > using. Many of those implementations also call into the Similarity API > > that you mentioned. > > > > If you'd like to see representative examples of scoring code, then take a > > look at TermWeight/TermScorer, and also BooleanWeight, which has several > > associated scorers. > > > > -Greg > > > > > > On Tue, Apr 7, 2015 at 1:32 AM, Ali Akhtar <ali.rac...@gmail.com> wrote: > > > > > Hello, > > > > > > I'm in a situation where a search query string is being submitted > > > simultaneously to Lucene, and to an external API. > > > > > > Results are fetched from both sources. I already have a score available > > for > > > Lucene results, but I don't have a score for the results fetched from > the > > > external source. > > > > > > I'd like to calculate scores of results from the API, so that I can > rank > > > the results by the score, and show the top 5 results from both sources. > > > (I.e the results would be merged.) > > > > > > Is there any Lucene API method, to which I can submit a search string > and > > > result string, and get a score back? If not, which class contains the > > > source code for calculating the score, so that I can implement my own > > > scoring class, using the same algorithm? > > > > > > I've looked at the Similarity class Javadocs, but it doesn't include > any > > > source code for calculating the score. > > > > > > Any help would be greatly appreciated. Thanks. > > > > > >