Using Lucene with a rather simplistic scoring system?

Marcel Bruch Fri, 11 Jun 2010 06:35:43 -0700

Hi!

We are working on an experimental code-search engine that helps users tofind example code snippets based on what a developer already typedinside her editor. Our "homemade search engine" produces some coolresults but its performance is somehow limited :-) Thus, we areevaluating whether Lucene can solve our performance issues. However, weare not familiar with Lucene and I wonder if some of you could help meto learn whether Lucene fits our problem well. Thanks in advance foryour comments.

The situation is as follows. For each source code file we extract somecode properties like which types are used inside the code, which methodsare overridden or which methods are called inside a method body etc. Foreach source code file we get a JSON structure similar to this:

{
    "class" : my.ExampleClass
    "extends" : the.SuperClass
    "overrides" :
        - the.SuperClass.method1()
        - the.SuperClass.method2()
    "used types":
        - a.Type1
        - a.Type2
        -   ...
    "used methods":
        - a.Type1.method32()
        - a.Type1.method23()
        - ...
<few more things>
}

The scoring function we use is rather simplistic. Given a query (whichlooks somehow identical to the document above) we determine for eachfeature (i.e. "used methods", "used types", "overrides" etc.) a simplematching strategy: the percentage of overlap between each query-documentfeature and db-document feature. Then we simply multiply eachfeature-score f_i with an individual feature-weight w_i and sum it allup into one overall score.

My questions are: Is it meaningful to use Lucene here in this setup- orput different - can I implement that scoring scheme with Lucene easily?How would such a solution look like? By just subclassing Scorer?


Many thanks in advance for advice

All the best,
Marcel

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Using Lucene with a rather simplistic scoring system?

Reply via email to