[ 
https://issues.apache.org/jira/browse/LUCENE-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682396#comment-13682396
 ] 

Hal Deadman commented on LUCENE-4698:
-------------------------------------

We are seeing an issue where certain shapes are causing Solr to use up all 
available heap space when a record with one of those shapes is indexed. We were 
indexing polygons where we had the points going clockwise instead of 
counter-clockwise and the shape would be so large that we would run out of 
memory. We fixed those shapes but we are seeing this circle eat up about 700MB 
of memory before we get an OutOfMemory error (heap space) with a 1GB JVM heap. 

Circle(3.0 90 d=0.0499542757922153)

Google Earth can't plot that circle either, maybe it is invalid or too close to 
the north pole due to the latitude of 90, but it would be nice if there was a 
way for shapes to be validated before they cause an OOM error. 

The objects (4.5 million) are all GeohashPrefixTree$GhCell objects in an 
ArrayList owned by PrefixTreeStrategy$CellTokenStream. 

Is there anyway to have a max number of cells in a shape before it is 
considered too large and is not indexed? Is there a geo library that could 
validate the shape as being reasonably sized and bounded before it is processed?

We are currently using Solr 4.1. 

     <fieldType name="location_rpt" 
class="solr.SpatialRecursivePrefixTreeFieldType"
               
spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
               geo="true" distErrPct="0.025" maxDistErr="0.000009" 
units="degrees" />




                
> Overhaul ShapeFieldCache because its a memory pig
> -------------------------------------------------
>
>                 Key: LUCENE-4698
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4698
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spatial
>            Reporter: David Smiley
>
> The org.apache.lucene.spatial.util.ShapeFieldCache* classes together 
> implement a spatial field cache for points, similar to FieldCache for other 
> fields.  It supports a variable number of points per document, and it's 
> currently only used by the SpatialPrefixTree strategy because that's the only 
> strategy that supports a variable number of points per document.  The other 
> spatial strategies use the FieldCache.  The ShapeFieldCache has problems:
> * It's a memory pig. Each point is stored as a Point object, instead of an 
> array of x & y coordinates. Furthermore, each Point is in an ArrayList that 
> exists for each Document. It's not done any differently when your spatial 
> data isn't multi-valued.
> * The cache is not per-segment, it's per-IndexReader, thereby making it 
> un-friendly to NRT search.
> * The cache entries don't self-expire optimally to free up memory. The cache 
> is simply stored in a WeakHashMap<IndexReader,ShapeFieldCache>. The big cache 
> entries are only freed when the WeakHashMap is used and the JVM realizes the 
> IndexSearcher instance has been GC'ed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to