[ https://issues.apache.org/jira/browse/LUCENE-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066199#comment-13066199 ]
Andrzej Bialecki commented on LUCENE-3320: ------------------------------------------- An interesting concept to consider under this topic is sentence-level proximity scoring. This is based on the assumption that often a proximity of terms within a single sentence is enough to treat this as a stronger-than-average association of terms, so when sentence boundaries are known the term positions can be reduced to just sentence numbers (i.e. postings from the same sentence use the same position that is a sentence number). This is a middle ground between the no-proximity data (omitPositions) and the full-proximity data. There is some literature available on this that indicates this approach is promising: http://www.springerlink.com/content/t5355418276v7115 , it's also mentioned in the papers on static index pruning. > Explore Proximity Scoring > -------------------------- > > Key: LUCENE-3320 > URL: https://issues.apache.org/jira/browse/LUCENE-3320 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search > Affects Versions: Positions Branch > Reporter: Simon Willnauer > Fix For: Positions Branch > > > Positions will be first class citizens rather sooner than later. We should > explore proximity scoring possibilities as well as collection / scoring > algorithms like proposed on LUCENE-2878 (2 phase collection) > This paper might provide some basis for actual scoring implementation: > http://plg.uwaterloo.ca/~claclark/sigir2006_term_proximity.pdf -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org