Steve Rowe and I have added scoring.xml (with some contributions from
Karl Wettin, Chris Hostetter and others) to the xdocs directory (and
scoring.html to the docs directory). Our goals in writing this
document were:
1. To better understand scoring
2. To document how scoring works for the Lucene community, as well as
document how to make changes to scoring for a specific application.
3. To kick start more documentation on scoring
I think we have achieved #1, which doesn't really benefit many others
yet, as for #2, that remains to be seen. #3 is up to us to do.
To the end of achieving #2, I would appreciate it if other developers
could take a look at http://lucene.apache.org/java/docs/scoring.html
and provide us feedback on any and all parts of this document.
Note, the above link is not yet hooked into the main menu system on
the left hand side of the Lucene site. In a week or two, once we
have some feedback and updates, my plan is to hook it into the
projects.xml menu under the menu title "Scoring".
Specifically, we are interested in:
1. Errata, clarifications, improvements, additions of things that are
useful. Where did we get the algorithms/descriptions wrong, where
could it be made more clear? Some of the areas of particular
interest are those highlighted in yellow. Additionally:
a. Filling in the "Big Picture" section with lower level details on
BooleanScorer2. Is this necessary?
b. Other examples of changing Similarity
c. Examples of adding your own Query. It would be great to have a
write up on the motivation behind SpanQuery or some of the other
Query classes (other than TermQuery). Also would be great to have
more on the semantics of what goes into implementing the various
methods on Weight and Scorer
d. Should there be more of a discussion about how Hits/Searchers/
Filters work? I purposely left these out b/c I wanted to focus on
scoring, but these pieces do play a role in enabling scoring
2. Organizational suggestions -- i.e how could this document be
better organized
3. Grammar, spelling
4. If anyone knows how to get the Greek Sigma character to pass
through in Anakia (Velocity), the section on the scoring formula
would be most appreciative. The usual Hex entity references don't
seem to pass through correctly. I suspect there needs to be a change
in the site.vsl but I don't know how to do it (there is also a Entity
reference in systemproperties.xml that is not working correctly.)
As for goal #3, please feel free to add more insight into the scoring
process, particularly if you can add value on the "why" question
(i.e. why is scoring done this way.) This document is most likely
just a start on documenting how scoring works.
As for changes, the best way is to submit a patch in JIRA (or just
commit the changes, if you can). If not JIRA, then at least reply to
this message.
-Grant
--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Skype: grant_ingersoll
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]