[
https://issues.apache.org/jira/browse/LUCENE-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley updated LUCENE-2844:
---------------------------------
Attachment: LUCENE-2844_spatial_benchmark.patch
I completely re-did this with a summer intern, Liviy Ambrose. It's similar but
simpler to the first approach; it isn't based on it. Unlike the first patch,
it does *not* modify any of the existing benchmark code (aside from the
build.xml of course). I intend to enhance the benchmark code under separate
issues, so that this patch can focus on just spatial benchmarking.
h3. Test data
The build.xml grabs a tab-separated values file from geonames.org, which
contains millions of latitude & longitude based points. I want to take a
snapshot (for reproducible tests), randomize the line order, and put it on
http://people.apache.org/~dsmiley/. Additionally, Spatial4j's tests has a file
containing a WKT-formatted polygon for many countries. I want to host that as
well in a format readable by LineDocSource.
h3. Source files (only 3):
* GeonamesLineParser.java: This is designed for use with LineDocSource.
Geonames.org data comes in a tab-separated value file.
* SpatialDocMaker.java: This class is key.
** It holds a reference to the Lucene SpatialStrategy which it configures from
the algorithm file, mostly via factories. It's possible to test quite a variety
of spatial configurations, although it does assume RecursivePrefixTree.
** This DocMaker has the specialization to convert the shape-formatted string
in the body field to a Shape object to be indexed. It also has a configurable
ShapeConverter to optionally convert a point to a circle or bounding box.
* SpatialFileQueryMaker.java: Instead of hard-coded queries (as seen in other
non-spatial tests), it configures a private LineDocSource instance and it reads
the shapes off that to use as spatial queries. For now you'd use it with
GeonamesLineParser. Furthermore, it re-uses SpatialDocMaker's ShapeConverter so
that the points can then become circle or rectangle queries.
The provided spatial.alg shows how to use it.
Notes:
* The spatial data is placed into the "body" field of a standard benchmark
DocData class as a string. Originally I experimented with a custom
SpatialDocData but I determined it was needless to do that since the existing
class can work. And after all, if you're testing spatial, you don't need to be
simultaneously testing text. I didn't put it in DocData's attached Properties
instance because that seems kinda heavyweight or at least medium-weight ;-)
The patch is *not* ready -- I need to add documentation, pending input on this
approach.
> benchmark geospatial performance based on geonames.org
> ------------------------------------------------------
>
> Key: LUCENE-2844
> URL: https://issues.apache.org/jira/browse/LUCENE-2844
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/benchmark
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Minor
> Fix For: 5.0, 4.6
>
> Attachments: benchmark-geo.patch, benchmark-geo.patch,
> LUCENE-2844_spatial_benchmark.patch
>
>
> See comments for details.
> In particular, the original patch "benchmark-geo.patch" is fairly different
> than LUCENE-2844.patch
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]