[ 
https://issues.apache.org/jira/browse/LUCENE-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-2844:
---------------------------------

    Attachment: LUCENE-2844_spatial_benchmark.patch

I completely re-did this with a summer intern, Liviy Ambrose. It's similar but 
simpler to the first approach; it isn't based on it.  Unlike the first patch, 
it does *not* modify any of the existing benchmark code (aside from the 
build.xml of course). I intend to enhance the benchmark code under separate 
issues, so that this patch can focus on just spatial benchmarking.

h3. Test data
The build.xml grabs a tab-separated values file from geonames.org, which 
contains millions of latitude & longitude based points. I want to take a 
snapshot (for reproducible tests), randomize the line order, and put it on 
http://people.apache.org/~dsmiley/.  Additionally, Spatial4j's tests has a file 
containing a WKT-formatted polygon for many countries. I want to host that as 
well in a format readable by LineDocSource.

h3. Source files (only 3):
* GeonamesLineParser.java: This is designed for use with LineDocSource.  
Geonames.org data comes in a tab-separated value file.
* SpatialDocMaker.java: This class is key.
** It holds a reference to the Lucene SpatialStrategy which it configures from 
the algorithm file, mostly via factories. It's possible to test quite a variety 
of spatial configurations, although it does assume RecursivePrefixTree.
** This DocMaker has the specialization to convert the shape-formatted string 
in the body field to a Shape object to be indexed.  It also has a configurable 
ShapeConverter to optionally convert a point to a circle or bounding box.
* SpatialFileQueryMaker.java: Instead of hard-coded queries (as seen in other 
non-spatial tests), it configures a private LineDocSource instance and it reads 
the shapes off that to use as spatial queries. For now you'd use it with 
GeonamesLineParser. Furthermore, it re-uses SpatialDocMaker's ShapeConverter so 
that the points can then become circle or rectangle queries.

The provided spatial.alg shows how to use it. 

Notes:
* The spatial data is placed into the "body" field of a standard benchmark 
DocData class as a string. Originally I experimented with a custom 
SpatialDocData but I determined it was needless to do that since the existing 
class can work. And after all, if you're testing spatial, you don't need to be 
simultaneously testing text. I didn't put it in DocData's attached Properties 
instance because that seems kinda heavyweight or at least medium-weight ;-)  

The patch is *not* ready -- I need to add documentation, pending input on this 
approach.
                
> benchmark geospatial performance based on geonames.org
> ------------------------------------------------------
>
>                 Key: LUCENE-2844
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2844
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/benchmark
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 5.0, 4.6
>
>         Attachments: benchmark-geo.patch, benchmark-geo.patch, 
> LUCENE-2844_spatial_benchmark.patch
>
>
> See comments for details.
> In particular, the original patch "benchmark-geo.patch" is fairly different 
> than LUCENE-2844.patch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to