Hi Marco,

2. The GeoSPARQL-Fuseki application has some options for convenience in setting up the Fuseki server. It looks like the two minute delay is caused by applying RDFS inferencing to the dataset and then writing the results into the datset (i.e. Jena operations). The GeoSPARQL schema has a class and property hierachy that a user can apply to their dataset for some of the functionality. The inferencing is applied by default when loading a file, but also when connecting to a TDB, in case it hasn't been done manually by the user. The other potentially costly operation is creating "hasDefaultGeometry" properties, which is switched off by default.

The following line should lead to quicker loading the second time.

./geosparql-fuseki --loopback false --tdb TDB1 --inference false

I could change the setup so that file loading applies inferencing by default and TDB does not, but I thought picking one would be better for consistent behaviour. Always true means less burden for users working out why they might have a problem after loading their dataset.

There is probably a broader question as to how/if these options should be integrated in with Fuseki, whether it should be a separate application or they should be left out. I think they are useful to a user who is looking for a GeoSPARQL solution. Currently, GeoSPARQL-Fuseki is using the main/embedded server so doesn't have a GUI etc.

3. I get what you mean about the invalidty of the query now. The polygon is invalid because it is not closed. However, I'm unclear about how these errors and warnings are handled any different to if there was a SPARQL syntax error. A Query Parse Exception is thrown with full stack trace. The error highlights the specific problem while the warning shows the context of the error and stack trace. This made it easier to hunt down these kinds of problems when they could be coming from a query or the dataset. What would you be looking for instead?

Thanks,

Greg

On 04/12/2018 12:01, Marco Neumann wrote:
comments inline

On Mon, Dec 3, 2018 at 5:14 PM Greg Albiston <galbis...@mail.com> wrote:

Hi Marco,

1. As mentioned this shouldn't be too difficult to support.

indeed not difficult but needs a decision

you could try with the following geonames dataset

all-geonames_lotico.ttl.gz



2. Yes, the indexing, or rather caching, is in-memory, but it is
on-demand. There shouldn't be any delay at start-up beyond what Jena
needs to do. The cost comes during query execution. The key invariant
data produced for solutions is retained for a short period of time (but
can be configured to be retained until termination). Some regularly
re-used info is always kept until termination (e.g. any spatial
reference system transformation that has been requested).

the following will create and populate the TDB dataset

./geosparql-fuseki --loopback false --rdf_file ./lm.ttl --tdb TDB1

I presume this message refers to the creation of the spatial cache / index

6:05:46.685 INFO  Applying GeoSPARQL Schema - Started
6:07:44.826 INFO  Applying GeoSPARQL Schema - Completed

next time I can call TDB directly

./geosparql-fuseki --loopback false --tdb TDB1

6:08:38.665 INFO  Applying GeoSPARQL Schema - Started
6:10:18.661 INFO  Applying GeoSPARQL Schema - Completed

takes approximately 2m for a very small data set. the same fuseki with
tdb+jena-spatial restarts almost instantaneously even with reasonably large
data sets (see geonames).


The main benefit of this is de-serialising geometry literals. The
spatial relations arguments are between a pair of geometry literals, one
of which is likely to be the same in the next solution, so keeping hold
of both means in alot of cases the de-serialisation can be avoided for
one (and possibly both if still retained from a previous set of solutions).

might be a good idea to serialize the cache object of de-serialisized
geometries to disk to speed up the boot process. maybe Andy could assist or
even align this with tdb


The aim was to only do work that's needed, not do repeat work and to be
generally quick (i.e. rely on JTS to be optimised for quick solutions
between the geometry pairs and Jena to optimise queries). There are 24
spatial relations and about half a dozen other functions so
pre-computing every combination gets big quickly and produces data that
users might not want/use.

A rough check of most the spatial relations only requires a bounding box
intersection or type check, so negative results can be quickly
discarded.  I looked into caching and storing to file, but there just
wasn't the benefit in my use case. It took longer to load up then
execute than just execute from fresh and cache. Also, the spatial
indexes implemented by JTS aren't designed/suited for the spatial
relations. If there is a use-case that gets more benefit from
pre-computing or storing between programme execution then I'm sure it
can be adapted for, but in the context of GeoSPARQL this approach was
effective.

3. If you could send me the dataset that causes these errors then I'll
happily have a look into it.

you can use this simple list of point geometries here

http://www.lotico.com/lm.ttl.gz

this query will parse and execute

PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>

SELECT ?well
WHERE {
   ?well <http://www.wikidata.org/entity/P625> ?geometry .
   FILTER(geof:sfWithin(?geometry,"POLYGON((-10 50,2 50,2 55,-10 55,-10
50))"^^geo:wktLiteral))
} LIMIT 10

this one will parse and fail

PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>

SELECT ?well
WHERE {
   ?well <http://www.wikidata.org/entity/P625> ?geometry .
   FILTER(geof:sfWithin(?geometry,"POLYGON((-10 50,2 50,2 55,-10 55,-10
51))"^^geo:wktLiteral))
} LIMIT 10

warn/error messages

6:17:45.887 ERROR Points of LinearRing do not form a closed linestring -
Illegal WKT literal: POLYGON((-10 50,2 50,2 55,-10 55,-10 51))
6:17:45.887 WARN  General exception in (<
http://www.opengis.net/def/function/geosparql/sfWithin> ?geometry
"POLYGON((-10 50,2 50,2 55,-10 55,-10 51))"^^<
http://www.opengis.net/ont/geosparql#wktLiteral>)
org.apache.jena.datatypes.DatatypeFormatException: Points of LinearRing do
not form a closed linestring - Illegal WKT literal: POLYGON((-10 50,2 50,2
55,-10 55,-10 51))
         at
io.github.galbiston.geosparql_jena.implementation.datatype.WKTDatatype.parse(WKTDatatype.java:109)
         at
io.github.galbiston.geosparql_jena.implementation.GeometryWrapper.extract(GeometryWrapper.java:905)
         at
io.github.galbiston.geosparql_jena.implementation.GeometryWrapper.extract(GeometryWrapper.java:834)
         at
io.github.galbiston.geosparql_jena.geof.topological.GenericFilterFunction.exec(GenericFilterFunction.java:57)
         at
io.github.galbiston.geosparql_jena.geof.topological.GenericFilterFunction.exec(GenericFilterFunction.java:42)
         at
org.apache.jena.sparql.function.FunctionBase2.exec(FunctionBase2.java:55)
         at
org.apache.jena.sparql.function.FunctionBase.exec(FunctionBase.java:63)
         at
org.apache.jena.sparql.expr.E_Function.evalSpecial(E_Function.java:89)
         at
org.apache.jena.sparql.expr.ExprFunctionN.eval(ExprFunctionN.java:100)
         at
org.apache.jena.sparql.expr.ExprNode.isSatisfied(ExprNode.java:41)
         at
org.apache.jena.sparql.engine.iterator.QueryIterFilterExpr.accept(QueryIterFilterExpr.java:49)
         at
org.apache.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding(QueryIterProcessBinding.java:69)
         at
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
         at
org.apache.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:58)
         at
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
         at
org.apache.jena.sparql.engine.iterator.QueryIterSlice.hasNextBinding(QueryIterSlice.java:76)
         at
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
         at
org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
         at
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
         at
org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
         at
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
         at
org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
         at
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
         at
org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:74)
         at
org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext(ResultSetCheckCondition.java:55)
         at
org.apache.jena.fuseki.servlets.SPARQL_Query.executeQuery(SPARQL_Query.java:350)
         at
org.apache.jena.fuseki.servlets.SPARQL_Query.execute(SPARQL_Query.java:288)
         at
org.apache.jena.fuseki.servlets.SPARQL_Query.executeWithParameter(SPARQL_Query.java:242)
         at
org.apache.jena.fuseki.servlets.SPARQL_Query.perform(SPARQL_Query.java:217)
         at
org.apache.jena.fuseki.servlets.ActionService.executeLifecycle(ActionService.java:183)
         at
org.apache.jena.fuseki.servlets.ActionService.execCommonWorker(ActionService.java:98)
         at
org.apache.jena.fuseki.servlets.ActionBase.doCommon(ActionBase.java:74)
         at
org.apache.jena.fuseki.servlets.FusekiFilter.doFilter(FusekiFilter.java:73)
         at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642)
         at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
         at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
         at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1340)
         at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
         at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
         at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
         at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1242)
         at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
         at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
         at org.eclipse.jetty.server.Server.handle(Server.java:503)
         at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)
         at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
         at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
         at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
         at
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
         at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
         at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
         at java.base/java.lang.Thread.run(Thread.java:834)




4. The "geo:" prefix is the one used throughout the GeoSPARQL
documentation, so has been used for consistency when needed. The code
doesn't have a dependency on the "geo:" prefix, so there is no
requirement on the user. It would probably cause more confusion to those
following GeoSPARQL examples to not use the "geo:" prefix when necessary.


I know but it needs some discussion about re-purposing of prefixes here



Thanks,

Greg

On 03/12/2018 15:46, Marco Neumann wrote:
Hi Greg, ok let's do it in the dev list first.

1. indeed the picking up of lat/long is a common if not the most common
use
case for building a spatial index. last but not least to perform a
proximity search on 2D point geometries. (I know that the ogc recommends
a
transformation path with a sparql query to turn lat / long into a WKT
geometry datatypes maybe we could provide this as a convenient option
with
the release)

2. as far as I can see the spatial index in geosparql-jena is memory
based.
it creates additional load time during server startup. Am I missing
something here, is there a file base spatial index as well?

3. error handling is disruptive. since we are hitting the spatial index
first during query execution I am seeing a number of unpleasant side
effects with syntactically correct sparql but semantically incorrect
spatial queries. e.g.

PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>

SELECT ?well
WHERE {
     ?well  <http://www.wikidata.org/entity/P625> ?geometry .
    FILTER(geof:sfWithin(?geometry,"POLYGON((-77 38,-77 0,0 38,0 0,0
0))"^^geo:wktLiteral))
} LIMIT 10

4. The re-use of the geo: prefix really isn't your problem I know but it
will create confusion. Wouldn't geosparql: be a better prefix for this?
Is
the OGC now married to this prefix? It used to be
http://www.w3.org/2003/01/geo/wgs84_pos#

and there is more to come...

again thank you for working on this with your team Greg, much
appreciated.







On Mon, Dec 3, 2018 at 2:15 PM Greg Albiston <galbis...@mail.com> wrote:

Hi Marco,

I've had a look at the doucmentation for Jena Spatial and it would seem
the main data change is the use of the Lat/Lon pairs.
This doesn't comply with the GeoSPARQL standard so support for this
would be a Jena extension.

This could be accomodated by a property function to convert to a WKT
Point literal with WGS84/CRS84 spatial reference.
Users would then be able to use the result in query for any of the
GeoSPARQL functions.

Alternatively, the spatial relations could all have an extra property
function defined, provide the conversion and hand over to the GeoSPARQL
equivalent property function. This wouldn't take long to integrate as
individual spatial relation property functions are very minimal.

The other item that jumps out is the Jena spatial property functions.

spatial:nearby, spatial:withinCircle, spatial:withinBox and
spatial:interesectBox all seem to be variations of Simple Features
spatial relations that are covered by GeoSPARQL. These property
functions can be incorpated for backward compatability but it's whether
these should just be offered as the current Lat/Lon pairs or expanded to
accept geometry literals (i.e. WKT, GML etc.)? The latter option
shouldn't be hard to provide for the same reason as above.

spatial:north, spatial:south, spatial:west and spatial:east are not in
GeoSPARQL. Again its a question of whether these should be provided more
generally for WKT, GML geometry literals? There might need to be a bit
of extra work handling both geographic and planar spatial reference
systems, as Jean Spatial is only doing a spatial reference system.

I don't think it would be too difficult to support the existing Jena
Spatial functionality, at least based on the webpage
(https://jena.apache.org/documentation/query/spatial-query.html), as an
extension to what is provided by GeoSPARQL.

Is there anything else that you were concerned about?

Thanks,

Greg


On 03/12/2018 10:53, Marco Neumann wrote:
so I've had a look at this and while I think geosparql-jena is a very
welcomed contribution to the jena project I don't think we should rush
with
the retirement of  jena-spatial at this point as Greg's approach will
require users to make changes to their existing data.

I will engage Greg on us...@jena.apache.org again to clarify a few
things
and hopefully get more people involved in this conversation around
spatial,
geosparql and jena.



On Fri, Nov 30, 2018 at 1:23 PM Marco Neumann <marco.neum...@gmail.com
wrote:

how quickly can you hook geosparql into the release?

this would make lucene spatial obsolete in the next release.  has Greg
released performance benchmarks for his implementation? as I said I
will
take a look at it over the weekend when time permits.

On Fri, Nov 30, 2018 at 11:02 AM Andy Seaborne <a...@apache.org>
wrote:
We could retire jena-spatial immediately after 3.10.0 - given the
Lucene
change that might be smoother, one release with updated dependencies.

If that is the way forward, I think it is (mildly) better to take it
out
of the Fuseki/Full build in 3.10.0.

        Andy

On 29/11/2018 17:00, Marco Neumann wrote:
I will have to look into that I guess since I am frequent user of
spatial
data.

why not go to 7.5? was there an incompatibility?

On Thu 29. Nov 2018 at 16:53, Andy Seaborne <a...@apache.org>
wrote:
Jena 3.1.0 would be around the end of the year. I'd like to make
use
of
Greg's GeoSPARQL project the "headline" item for the release and to
retire jena-spatial in 3.10.0 as an indication of this.

Because retirement is a new process for the project, I'm sending
this
first 3.10.0 message quite early to give us discussion time.

== Retirements

We have talked about this before but not actually done anything.
See
separate thread for discussion on retirement process and for the
first
modules:

jena-spatial
jena-fuseki1
jena-csv

== Headlines

JENA-664 : GeoSPARQL support

I'd like to make use of Greg's GeoSPARQL project the "headline"
item
for
the release and to retire jena-spatial in 3.10.0 as an indication
of
this.
JENA-1621 : Lucene upgrade to 7.4
        May need to reload lucene indexes.
(e.g. the lucene index was create originally with Lucene v5.x
(prior
Jena 3.3.0). See Lucene upgrade tool.
https://lucene.apache.org/solr/guide/7_4/indexupgrader-tool.html

JENA-1623 : Fuseki security
JENA-1627 : HTTP support
https://issues.apache.org/jira/browse/JENA-1623

http://jena.staging.apache.org/documentation/fuseki2/data-access-control
== JIRA:

31 currently.

https://s.apache.org/jena-3.10.0-jira

== Updates

Only plugins. JENA-1624

surefire : 2.21.0 -> 2.22.1 (+ SUREFIRE-1588)
compiler : 3.7.0 -> 3.8.0
shade    : 3.1.0 -> 3.2.0

            Andy

--


---
Marco Neumann
KONA



Reply via email to