comments inline
On Mon, Dec 3, 2018 at 5:14 PM Greg Albiston <galbis...@mail.com> wrote:
Hi Marco,
1. As mentioned this shouldn't be too difficult to support.
indeed not difficult but needs a decision
you could try with the following geonames dataset
all-geonames_lotico.ttl.gz
2. Yes, the indexing, or rather caching, is in-memory, but it is
on-demand. There shouldn't be any delay at start-up beyond what Jena
needs to do. The cost comes during query execution. The key invariant
data produced for solutions is retained for a short period of time (but
can be configured to be retained until termination). Some regularly
re-used info is always kept until termination (e.g. any spatial
reference system transformation that has been requested).
the following will create and populate the TDB dataset
./geosparql-fuseki --loopback false --rdf_file ./lm.ttl --tdb TDB1
I presume this message refers to the creation of the spatial cache / index
6:05:46.685 INFO Applying GeoSPARQL Schema - Started
6:07:44.826 INFO Applying GeoSPARQL Schema - Completed
next time I can call TDB directly
./geosparql-fuseki --loopback false --tdb TDB1
6:08:38.665 INFO Applying GeoSPARQL Schema - Started
6:10:18.661 INFO Applying GeoSPARQL Schema - Completed
takes approximately 2m for a very small data set. the same fuseki with
tdb+jena-spatial restarts almost instantaneously even with reasonably large
data sets (see geonames).
The main benefit of this is de-serialising geometry literals. The
spatial relations arguments are between a pair of geometry literals, one
of which is likely to be the same in the next solution, so keeping hold
of both means in alot of cases the de-serialisation can be avoided for
one (and possibly both if still retained from a previous set of solutions).
might be a good idea to serialize the cache object of de-serialisized
geometries to disk to speed up the boot process. maybe Andy could assist or
even align this with tdb
The aim was to only do work that's needed, not do repeat work and to be
generally quick (i.e. rely on JTS to be optimised for quick solutions
between the geometry pairs and Jena to optimise queries). There are 24
spatial relations and about half a dozen other functions so
pre-computing every combination gets big quickly and produces data that
users might not want/use.
A rough check of most the spatial relations only requires a bounding box
intersection or type check, so negative results can be quickly
discarded. I looked into caching and storing to file, but there just
wasn't the benefit in my use case. It took longer to load up then
execute than just execute from fresh and cache. Also, the spatial
indexes implemented by JTS aren't designed/suited for the spatial
relations. If there is a use-case that gets more benefit from
pre-computing or storing between programme execution then I'm sure it
can be adapted for, but in the context of GeoSPARQL this approach was
effective.
3. If you could send me the dataset that causes these errors then I'll
happily have a look into it.
you can use this simple list of point geometries here
http://www.lotico.com/lm.ttl.gz
this query will parse and execute
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
SELECT ?well
WHERE {
?well <http://www.wikidata.org/entity/P625> ?geometry .
FILTER(geof:sfWithin(?geometry,"POLYGON((-10 50,2 50,2 55,-10 55,-10
50))"^^geo:wktLiteral))
} LIMIT 10
this one will parse and fail
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
SELECT ?well
WHERE {
?well <http://www.wikidata.org/entity/P625> ?geometry .
FILTER(geof:sfWithin(?geometry,"POLYGON((-10 50,2 50,2 55,-10 55,-10
51))"^^geo:wktLiteral))
} LIMIT 10
warn/error messages
6:17:45.887 ERROR Points of LinearRing do not form a closed linestring -
Illegal WKT literal: POLYGON((-10 50,2 50,2 55,-10 55,-10 51))
6:17:45.887 WARN General exception in (<
http://www.opengis.net/def/function/geosparql/sfWithin> ?geometry
"POLYGON((-10 50,2 50,2 55,-10 55,-10 51))"^^<
http://www.opengis.net/ont/geosparql#wktLiteral>)
org.apache.jena.datatypes.DatatypeFormatException: Points of LinearRing do
not form a closed linestring - Illegal WKT literal: POLYGON((-10 50,2 50,2
55,-10 55,-10 51))
at
io.github.galbiston.geosparql_jena.implementation.datatype.WKTDatatype.parse(WKTDatatype.java:109)
at
io.github.galbiston.geosparql_jena.implementation.GeometryWrapper.extract(GeometryWrapper.java:905)
at
io.github.galbiston.geosparql_jena.implementation.GeometryWrapper.extract(GeometryWrapper.java:834)
at
io.github.galbiston.geosparql_jena.geof.topological.GenericFilterFunction.exec(GenericFilterFunction.java:57)
at
io.github.galbiston.geosparql_jena.geof.topological.GenericFilterFunction.exec(GenericFilterFunction.java:42)
at
org.apache.jena.sparql.function.FunctionBase2.exec(FunctionBase2.java:55)
at
org.apache.jena.sparql.function.FunctionBase.exec(FunctionBase.java:63)
at
org.apache.jena.sparql.expr.E_Function.evalSpecial(E_Function.java:89)
at
org.apache.jena.sparql.expr.ExprFunctionN.eval(ExprFunctionN.java:100)
at
org.apache.jena.sparql.expr.ExprNode.isSatisfied(ExprNode.java:41)
at
org.apache.jena.sparql.engine.iterator.QueryIterFilterExpr.accept(QueryIterFilterExpr.java:49)
at
org.apache.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding(QueryIterProcessBinding.java:69)
at
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
at
org.apache.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:58)
at
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
at
org.apache.jena.sparql.engine.iterator.QueryIterSlice.hasNextBinding(QueryIterSlice.java:76)
at
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
at
org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
at
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
at
org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
at
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
at
org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
at
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
at
org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:74)
at
org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext(ResultSetCheckCondition.java:55)
at
org.apache.jena.fuseki.servlets.SPARQL_Query.executeQuery(SPARQL_Query.java:350)
at
org.apache.jena.fuseki.servlets.SPARQL_Query.execute(SPARQL_Query.java:288)
at
org.apache.jena.fuseki.servlets.SPARQL_Query.executeWithParameter(SPARQL_Query.java:242)
at
org.apache.jena.fuseki.servlets.SPARQL_Query.perform(SPARQL_Query.java:217)
at
org.apache.jena.fuseki.servlets.ActionService.executeLifecycle(ActionService.java:183)
at
org.apache.jena.fuseki.servlets.ActionService.execCommonWorker(ActionService.java:98)
at
org.apache.jena.fuseki.servlets.ActionBase.doCommon(ActionBase.java:74)
at
org.apache.jena.fuseki.servlets.FusekiFilter.doFilter(FusekiFilter.java:73)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1340)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1242)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:503)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
at java.base/java.lang.Thread.run(Thread.java:834)
4. The "geo:" prefix is the one used throughout the GeoSPARQL
documentation, so has been used for consistency when needed. The code
doesn't have a dependency on the "geo:" prefix, so there is no
requirement on the user. It would probably cause more confusion to those
following GeoSPARQL examples to not use the "geo:" prefix when necessary.
I know but it needs some discussion about re-purposing of prefixes here
Thanks,
Greg
On 03/12/2018 15:46, Marco Neumann wrote:
Hi Greg, ok let's do it in the dev list first.
1. indeed the picking up of lat/long is a common if not the most common
use
case for building a spatial index. last but not least to perform a
proximity search on 2D point geometries. (I know that the ogc recommends
a
transformation path with a sparql query to turn lat / long into a WKT
geometry datatypes maybe we could provide this as a convenient option
with
the release)
2. as far as I can see the spatial index in geosparql-jena is memory
based.
it creates additional load time during server startup. Am I missing
something here, is there a file base spatial index as well?
3. error handling is disruptive. since we are hitting the spatial index
first during query execution I am seeing a number of unpleasant side
effects with syntactically correct sparql but semantically incorrect
spatial queries. e.g.
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
SELECT ?well
WHERE {
?well <http://www.wikidata.org/entity/P625> ?geometry .
FILTER(geof:sfWithin(?geometry,"POLYGON((-77 38,-77 0,0 38,0 0,0
0))"^^geo:wktLiteral))
} LIMIT 10
4. The re-use of the geo: prefix really isn't your problem I know but it
will create confusion. Wouldn't geosparql: be a better prefix for this?
Is
the OGC now married to this prefix? It used to be
http://www.w3.org/2003/01/geo/wgs84_pos#
and there is more to come...
again thank you for working on this with your team Greg, much
appreciated.
On Mon, Dec 3, 2018 at 2:15 PM Greg Albiston <galbis...@mail.com> wrote:
Hi Marco,
I've had a look at the doucmentation for Jena Spatial and it would seem
the main data change is the use of the Lat/Lon pairs.
This doesn't comply with the GeoSPARQL standard so support for this
would be a Jena extension.
This could be accomodated by a property function to convert to a WKT
Point literal with WGS84/CRS84 spatial reference.
Users would then be able to use the result in query for any of the
GeoSPARQL functions.
Alternatively, the spatial relations could all have an extra property
function defined, provide the conversion and hand over to the GeoSPARQL
equivalent property function. This wouldn't take long to integrate as
individual spatial relation property functions are very minimal.
The other item that jumps out is the Jena spatial property functions.
spatial:nearby, spatial:withinCircle, spatial:withinBox and
spatial:interesectBox all seem to be variations of Simple Features
spatial relations that are covered by GeoSPARQL. These property
functions can be incorpated for backward compatability but it's whether
these should just be offered as the current Lat/Lon pairs or expanded to
accept geometry literals (i.e. WKT, GML etc.)? The latter option
shouldn't be hard to provide for the same reason as above.
spatial:north, spatial:south, spatial:west and spatial:east are not in
GeoSPARQL. Again its a question of whether these should be provided more
generally for WKT, GML geometry literals? There might need to be a bit
of extra work handling both geographic and planar spatial reference
systems, as Jean Spatial is only doing a spatial reference system.
I don't think it would be too difficult to support the existing Jena
Spatial functionality, at least based on the webpage
(https://jena.apache.org/documentation/query/spatial-query.html), as an
extension to what is provided by GeoSPARQL.
Is there anything else that you were concerned about?
Thanks,
Greg
On 03/12/2018 10:53, Marco Neumann wrote:
so I've had a look at this and while I think geosparql-jena is a very
welcomed contribution to the jena project I don't think we should rush
with
the retirement of jena-spatial at this point as Greg's approach will
require users to make changes to their existing data.
I will engage Greg on us...@jena.apache.org again to clarify a few
things
and hopefully get more people involved in this conversation around
spatial,
geosparql and jena.
On Fri, Nov 30, 2018 at 1:23 PM Marco Neumann <marco.neum...@gmail.com
wrote:
how quickly can you hook geosparql into the release?
this would make lucene spatial obsolete in the next release. has Greg
released performance benchmarks for his implementation? as I said I
will
take a look at it over the weekend when time permits.
On Fri, Nov 30, 2018 at 11:02 AM Andy Seaborne <a...@apache.org>
wrote:
We could retire jena-spatial immediately after 3.10.0 - given the
Lucene
change that might be smoother, one release with updated dependencies.
If that is the way forward, I think it is (mildly) better to take it
out
of the Fuseki/Full build in 3.10.0.
Andy
On 29/11/2018 17:00, Marco Neumann wrote:
I will have to look into that I guess since I am frequent user of
spatial
data.
why not go to 7.5? was there an incompatibility?
On Thu 29. Nov 2018 at 16:53, Andy Seaborne <a...@apache.org>
wrote:
Jena 3.1.0 would be around the end of the year. I'd like to make
use
of
Greg's GeoSPARQL project the "headline" item for the release and to
retire jena-spatial in 3.10.0 as an indication of this.
Because retirement is a new process for the project, I'm sending
this
first 3.10.0 message quite early to give us discussion time.
== Retirements
We have talked about this before but not actually done anything.
See
separate thread for discussion on retirement process and for the
first
modules:
jena-spatial
jena-fuseki1
jena-csv
== Headlines
JENA-664 : GeoSPARQL support
I'd like to make use of Greg's GeoSPARQL project the "headline"
item
for
the release and to retire jena-spatial in 3.10.0 as an indication
of
this.
JENA-1621 : Lucene upgrade to 7.4
May need to reload lucene indexes.
(e.g. the lucene index was create originally with Lucene v5.x
(prior
Jena 3.3.0). See Lucene upgrade tool.
https://lucene.apache.org/solr/guide/7_4/indexupgrader-tool.html
JENA-1623 : Fuseki security
JENA-1627 : HTTP support
https://issues.apache.org/jira/browse/JENA-1623
http://jena.staging.apache.org/documentation/fuseki2/data-access-control
== JIRA:
31 currently.
https://s.apache.org/jena-3.10.0-jira
== Updates
Only plugins. JENA-1624
surefire : 2.21.0 -> 2.22.1 (+ SUREFIRE-1588)
compiler : 3.7.0 -> 3.8.0
shade : 3.1.0 -> 3.2.0
Andy
--
---
Marco Neumann
KONA