[
https://issues.apache.org/jira/browse/LUCENE-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354971#comment-14354971
]
David Smiley commented on LUCENE-6191:
--------------------------------------
FYI I'm adding an advisory to the javadocs to PrefixTreeFacetCounter that
double-counting can occur in certain avoidable situations:
{code:java}
* <em>NOTE:</em> If for a given document and a given field using
* {@link org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy}
* multiple values are indexed (i.e. multi-valued) and at least one of them is
a non-point, then there is a possibility
* of double-counting the document in the facet results. Since each shape is
independently turned into grid cells at
* a resolution chosen by the shape's size, it's possible they will be indexed
at different resolutions. This means
* the document could be present in BOTH the postings for a cell in both its
prefix and leaf variants. To avoid this,
* use a single valued field with a {@link
com.spatial4j.core.shape.ShapeCollection} (or WKT equivalent). Or
* calculate a suitable level/distErr to index both and call
* {@link
org.apache.lucene.spatial.prefix.PrefixTreeStrategy#createIndexableFields(com.spatial4j.core.shape.Shape,
int)}
* with the same value for all shapes for a given document/field.
{code}
> Spatial 2D faceting (heatmaps)
> ------------------------------
>
> Key: LUCENE-6191
> URL: https://issues.apache.org/jira/browse/LUCENE-6191
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/spatial
> Reporter: David Smiley
> Assignee: David Smiley
> Fix For: 5.1
>
> Attachments: LUCENE-6191__Spatial_heatmap.patch,
> LUCENE-6191__Spatial_heatmap.patch, LUCENE-6191__Spatial_heatmap.patch
>
>
> Lucene spatial's PrefixTree (grid) based strategies index data in a way
> highly amenable to faceting on grids cells to compute a so-called _heatmap_.
> The underlying code in this patch uses the PrefixTreeFacetCounter utility
> class which was recently refactored out of faceting for NumberRangePrefixTree
> LUCENE-5735. At a low level, the terms (== grid cells) are navigated
> per-segment, forward only with TermsEnum.seek, so it's pretty quick and
> furthermore requires no extra caches & no docvalues. Ideally you should use
> QuadPrefixTree (or Flex once it comes out) to maximize the number grid levels
> which in turn maximizes the fidelity of choices when you ask for a grid
> covering a region. Conveniently, the provided capability returns the data in
> a 2-D grid of counts, so the caller needn't know a thing about how the data
> is encoded in the prefix tree. Well almost... at this point they need to
> provide a grid level, but I'll soon provide a means of deriving the grid
> level based on a min/max cell count.
> I recommend QuadPrefixTree with geo=false so that you can provide a square
> world-bounds (360x360 degrees), which means square grid cells which are more
> desirable to display than rectangular cells.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]