Hi!

We are working on an experimental code-search engine that helps users to find example code snippets based on what a developer already typed inside her editor. Our "homemade search engine" produces some cool results but its performance is somehow limited :-) Thus, we are evaluating whether Lucene can solve our performance issues. However, we are not familiar with Lucene and I wonder if some of you could help me to learn whether Lucene fits our problem well. Thanks in advance for your comments.

The situation is as follows. For each source code file we extract some code properties like which types are used inside the code, which methods are overridden or which methods are called inside a method body etc. For each source code file we get a JSON structure similar to this:
{
    "class" : my.ExampleClass
    "extends" : the.SuperClass
    "overrides" :
        - the.SuperClass.method1()
        - the.SuperClass.method2()
    "used types":
        - a.Type1
        - a.Type2
        -   ...
    "used methods":
        - a.Type1.method32()
        - a.Type1.method23()
        - ...
<few more things>
}
The scoring function we use is rather simplistic. Given a query (which looks somehow identical to the document above) we determine for each feature (i.e. "used methods", "used types", "overrides" etc.) a simple matching strategy: the percentage of overlap between each query-document feature and db-document feature. Then we simply multiply each feature-score f_i with an individual feature-weight w_i and sum it all up into one overall score.

My questions are: Is it meaningful to use Lucene here in this setup- or put different - can I implement that scoring scheme with Lucene easily? How would such a solution look like? By just subclassing Scorer?

Many thanks in advance for advice

All the best,
Marcel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to