Good idea is combined with batching or switching to evaluate once and
joining locally with any results from earlier in the execution.
... otherwise it can cause problems for the remote service and look like
a denial-of-service attack.
Andy
On 17/06/2019 09:38, Marco Neumann wrote:
very good Lorenz, looks like Dave is already on the job while we speak.
On Mon, Jun 17, 2019 at 9:32 AM Lorenz <[email protected]>
wrote:
At least it looks like a good time to discuss it given that there is
already some ongoing work on the SERVICE feature?
The mailing list topic is called "Batching federated calls using VALUES
block", initial question was in April [1], latest status mail w.r.t.
external contribution was this month [2]
[1]
https://lists.apache.org/thread.html/ebfbeb950d43ef1f92057c1c4de12bb42f3db1f4a7afc6601243c9c2@%3Cusers.jena.apache.org%3E
[2]
https://lists.apache.org/thread.html/7d85cad8dc54c0bf8fde73ec879a3520127d1f8f70192785d2623874@%3Cusers.jena.apache.org%3E
OK that's useful Lorenz, thank you. I see AKSW is evaluating a number of
solutions here
https://svn.aksw.org/papers/2017/FedEval-summary/public.pdf
Since fuseki is thread-safe one can certainly delegate the query
segmentation to the application logic and issue multiple queries to
individual (fuseki or any other) endpoints concurrently.
use case here is to work with a large sharded dataset from one query.
latency is currently not of essence to the use case but could be
mitigated
by hording nodes on the same network segment.
I just wonder if the threading of SERVICE would require any significant
rewrite of ARQ or if this is already an encapsulated process that lends
itself to threading.
On Mon, Jun 17, 2019 at 6:36 AM Lorenz <[email protected]
.invalid>
wrote:
Honestly, with that extensive use of SERVICE feature it clearly would
make sense to make use of parallel execution. Never heard of such a
query, but sounds like fun.
What is the use-case here? Can you give some insights? Are all of them
remote SPARQL services?
By the way, did you ever consider or even try on of the existing
federated query engines like FedX, ANAPSID, HIBISCUS, etc. ? I'm
wondering how those would work (if even scale) with ~100 sources like it
looks to be the case in your query?
While using a query with a large number (100+) of remote sparql
endpoints,
using the SERVICE keyword for a federated query, I have noticed that
Jena
keeps waiting in the queue for slow responses to finish up before
proceeding to the next node.
Would it not be a good idea to make SERVICE a thread to speed up the
process in the query?
--
Lorenz Bühmann
AKSW group, University of Leipzig
Group: http://aksw.org - semantic web research center
Hello Marco,
Kind regards,
Lorenz
--
Lorenz Bühmann
AKSW group, University of Leipzig
Group: http://aksw.org - semantic web research center