[ 
https://issues.apache.org/jira/browse/JENA-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512226#comment-17512226
 ] 

Lorenz Bühmann commented on JENA-2311:
--------------------------------------

Ok, I see that using a counter variable is useless here - clearly a 
misconception from my side.

But then my question, why no using a Triple (or even Tuple per predicate) which 
does only contain references to the Node objects? Maybe you had some other 
drawbacks in mind why you decided to use a string as hash key?


Another off-topic question: from experiments I can say that using geospatial 
property functions is way slower than using the corresponding filter functions. 
The reason is obvious, with the filter function style, the evaluation can make 
use of efficient joins without touching the node objects first. On the other 
hand, the property function style does a separate graph.find call for each 
binding being evaluated.
Did you maybe also try to do the query rewriting on the query resp. algebra 
level first, and maybe came to the conclusion that it won't matter here?

By the way, thanks for building and working on the geospatial layer. Very nice 
piece of work to use.

> query rewrite index does too expensive caching on geo literals
> --------------------------------------------------------------
>
>                 Key: JENA-2311
>                 URL: https://issues.apache.org/jira/browse/JENA-2311
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: GeoSPARQL
>    Affects Versions: Jena 4.4.0
>            Reporter: Lorenz Bühmann
>            Priority: Major
>
> Using a GeoSPARQL query with a geospatial property function, e.g.
> {code:java}
> SELECT * {
> :x geo:hasGeometry ?geo1 .
> ?s2 geo:hasGeometry ?geo2 .
> ?geo1 geo:sfContains ?geo2
> }
> {code}
> leads to heavy memory consumption for larger datasets - and we're not talking 
> about big data at all. Imagine given a polygon and checking for millions of 
> geometries for containment in the polygon.
> In the {{QueryRewriteIndex}} class for caching a key will be generated, but 
> this is horribly expensive given that the string representation of Geometries 
> is called millions of times leading millions of Byte arrays being created 
> leading a to a possible OOM exception - we got it with 8GB assigned.
> The key generation for reference:
> {code:java}
> String key = subjectGeometryLiteral.getLiteralLexicalForm() + KEY_SEPARATOR + 
> predicate.getURI() + KEY_SEPARATOR + 
> objectGeometryLiteral.getLiteralLexicalForm();
> {code}
> My suggestion is to use a separate {{Node -> Integer}} (or {{Long}}?) Guava 
> cache and use the long values instead to generate the cache key. Or any other 
> more efficient datastructure, not even sure if a String is necessary?
> We tried some fix which works for us and keeps the memory consumption stable:
> {code:java}
>  private LoadingCache<Node, Integer> nodeIDCache;
>  private AtomicInteger cacheCounter;
> ...
>         cacheCounter = new AtomicInteger(0);
>         CacheBuilder<Object, Object> builder = CacheBuilder.newBuilder();
>         if (maxSize > 0) {
>             builder = builder.maximumSize(maxSize);
>         }
>         if (expiryInterval > 0) {
>             builder = builder.expireAfterWrite(expiryInterval, 
> TimeUnit.MILLISECONDS);
>         }
>         nodeIDCache = builder.build(
>                         new CacheLoader<>() {
>                             public Integer load(Node key) {
>                                 return cacheCounter.incrementAndGet();
>                             }
>                         });
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to