David Smiley created LUCENE-5779:
------------------------------------
Summary: Improve BBox AreaSimilarity algorithm to consider lines
and points
Key: LUCENE-5779
URL: https://issues.apache.org/jira/browse/LUCENE-5779
Project: Lucene - Core
Issue Type: Improvement
Components: modules/spatial
Reporter: David Smiley
GeoPortal's area overlap algorithm didn't consider lines and points; they end
up turning the score 0. I've thought about this for a bit and I've come up
with an alternative scoring algorithm. (already coded and tested and
documented):
New Javadocs:
{code:java}
/**
* The algorithm is implemented as envelope on envelope overlays rather than
* complex polygon on complex polygon overlays.
* <p/>
* <p/>
* Spatial relevance scoring algorithm:
* <DL>
* <DT>queryArea</DT> <DD>the area of the input query envelope</DD>
* <DT>targetArea</DT> <DD>the area of the target envelope (per Lucene
document)</DD>
* <DT>intersectionArea</DT> <DD>the area of the intersection between the
query and target envelopes</DD>
* <DT>queryTargetProportion</DT> <DD>A 0-1 factor that divides the score
proportion between query and target.
* 0.5 is evenly.</DD>
*
* <DT>queryRatio</DT> <DD>intersectionArea / queryArea; (see note)</DD>
* <DT>targetRatio</DT> <DD>intersectionArea / targetArea; (see note)</DD>
* <DT>queryFactor</DT> <DD>queryRatio * queryTargetProportion;</DD>
* <DT>targetFactor</DT> <DD>targetRatio * (1 - queryTargetProportion);</DD>
* <DT>score</DT> <DD>queryFactor + targetFactor;</DD>
* </DL>
* Note: The actual computation of queryRatio and targetRatio is more
complicated so that it considers
* points and lines. Lines have the ratio of overlap, and points are either 1.0
or 0.0 depending on wether
* it intersects or not.
* <p />
* Based on Geoportal's
* <a
href="http://geoportal.svn.sourceforge.net/svnroot/geoportal/Geoportal/trunk/src/com/esri/gpt/catalog/lucene/SpatialRankingValueSource.java">
* SpatialRankingValueSource</a> but modified. GeoPortal's algorithm will
yield a score of 0
* if either a line or point is compared, and it's doesn't output a 0-1
normalized score (it multiplies the factors).
*
* @lucene.experimental
*/
{code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]