[
https://issues.apache.org/jira/browse/SOLR-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley updated SOLR-7005:
-------------------------------
Attachment: SOLR-7005_heatmap.patch
Thanks for the encouragement Shalin, and Erik on #lucene-dev, and others via
email who have gotten wind of this.
Here's the first-draft patch. It is still based on being its own
SearchComponent, and it doesn't yet support distributed-search -- those issues
should be addressed next.
I added support for the "distErr" parameter to facilitate computing the grid
level in the same fashion as used by Lucene spatial to ultimately derive a grid
level for a given shape (a rect/box in this case). In fact it re-uses utility
methods in Lucene spatial to compute the grid level given the world boundary,
distErr (if provided) and distErrPct (if provided). The units of distErr is
the same as distanceUnits attribute on the field type (a new Solr 5 thing). So
if units is a kilometer and distErr is 100 then the grid cells returns are at
least as precise as 100 kilometers (which BTW is a little less than a spherical
degree for Earth, which is 111.2km). The 512x256 heatmap I uploaded was
generated by specifying distErr=111.2. A client could compute a distErr if
they instead know how many minimum cells they want in the heatmap. I may bake
that formula in and provide a minCells param.
For distributed-search, I'm thinking the internal shard requests will use PNG
since it's compressed, and then the user can get whatever format they asked
for. I only want to write the aggregation logic once, not per-format :-)
As a part of this work I found it useful to add SpatialUtils.parseRectangle
which parses the {{[lowerLeftPoint TO upperRightPoint]}} format. In another
issue I want to re-use this to provide a more Solr-friendly way of indexing a
rectangle (for e.g. BBoxField or RPT) or for specifying worldBounds on the
field type.
Even though I don't have distributed-search implemented yet, the test extends
BaseDistributedSearchTestCase any way. I dislike the idea of writing two tests
that test the same thing (one distributed, one not) when the infrastructure
should make it indifferent since it's transparent to input & output I'm
testing. Unfortunately, assertQ & friends are hard-coded to use TestHarness
which is in turn hard-coded to use an embedded Solr instance. And
unfortunately, BaseDistributedSearchTestCase doesn't let me test 0 shards (hey,
I haven't implemented that feature yet!). The patch tweaks
BaseDistributedSearchTestCase slightly to let me do this.
> facet.heatmap for spatial heatmap faceting on RPT
> -------------------------------------------------
>
> Key: SOLR-7005
> URL: https://issues.apache.org/jira/browse/SOLR-7005
> Project: Solr
> Issue Type: New Feature
> Components: spatial
> Reporter: David Smiley
> Assignee: David Smiley
> Fix For: 5.1
>
> Attachments: SOLR-7005_heatmap.patch, heatmap_512x256.png,
> heatmap_64x32.png
>
>
> This is a new feature that uses the new spatial Heatmap / 2D PrefixTree cell
> counter in Lucene spatial LUCENE-6191. This is a form of faceting, and
> as-such I think it should live in the "facet" parameter namespace. Here's
> what the parameters are:
> * facet=true
> * facet.heatmap=fieldname
> * facet.heatmap.bbox=\["-180 -90" TO "180 90"]
> * facet.heatmap.gridLevel=6
> * facet.heatmap.distErrPct=0.10
> Like other faceting features, the fieldName can have local-params to exclude
> filter queries or specify an output key.
> The bbox is optional; you get the whole world or you can specify a box or
> actually any shape that WKT supports (you get the bounding box of whatever
> you put).
> Ultimately, this feature needs to know the grid level, which together with
> the input shape will yield a certain number of cells. You can specify
> gridLevel exactly, or don't and instead provide distErrPct which is computed
> like it is for the RPT field type as seen in the schema. 0.10 yielded ~4k
> cells but it'll vary. There's also a facet.heatmap.maxCells safety net
> defaulting to 100k. Exceed this and you get an error.
> The output is (JSON):
> {noformat}
> {gridLevel=6,columns=64,rows=64,minX=-180.0,maxX=180.0,minY=-90.0,maxY=90.0,counts=[[0,
> 0, 2, 1, ....],[1, 1, 3, 2, ...],...]}
> {noformat}
> counts is null if all would be 0. Perhaps individual row arrays should
> likewise be null... I welcome feedback.
> I'm toying with an output format option in which you can specify a base-64'ed
> grayscale PNG.
> Obviously this should support sharded / distributed environments.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]