Marco

There's probably also an effect of HTTP connection contention due to the 
default Apache HTTP Client settings.  Even if SERVICE was fully multi-threaded 
this would still be an issue.

You may want to customise the HTTP client configuration to better match your 
use case - 
https://jena.apache.org/documentation/javadoc/arq/org/apache/jena/riot/web/HttpOp.html#setDefaultHttpClient-org.apache.http.client.HttpClient-

Rob

On 17/06/2019, 09:33, "Lorenz" <[email protected]> wrote:

    At least it looks like a good time to discuss it given that there is
    already some ongoing work on the SERVICE feature?
    
    The mailing list topic is called "Batching federated calls using VALUES
    block", initial question was in April [1], latest status mail w.r.t.
    external contribution was this month [2]
    
    [1]
    
https://lists.apache.org/thread.html/ebfbeb950d43ef1f92057c1c4de12bb42f3db1f4a7afc6601243c9c2@%3Cusers.jena.apache.org%3E
    [2]
    
https://lists.apache.org/thread.html/7d85cad8dc54c0bf8fde73ec879a3520127d1f8f70192785d2623874@%3Cusers.jena.apache.org%3E
    
    > OK that's useful Lorenz, thank you. I see AKSW is evaluating a number of
    > solutions here
    >
    > https://svn.aksw.org/papers/2017/FedEval-summary/public.pdf
    >
    > Since fuseki is thread-safe one can certainly delegate the query
    > segmentation to the application logic and issue multiple queries to
    > individual (fuseki or any other) endpoints concurrently.
    >
    > use case here is to work with a large sharded dataset from one query.
    > latency is currently not of essence to the use case but could be mitigated
    > by hording nodes on the same network segment.
    >
    > I just wonder if the threading of SERVICE would require any significant
    > rewrite of ARQ or if this is already an encapsulated process that lends
    > itself to threading.
    >
    >
    >
    > On Mon, Jun 17, 2019 at 6:36 AM Lorenz 
<[email protected]>
    > wrote:
    >
    >> Honestly, with that extensive use of SERVICE feature it clearly would
    >> make sense to make use of parallel execution. Never heard of such a
    >> query, but sounds like fun.
    >>
    >> What is the use-case here? Can you give some insights? Are all of them
    >> remote SPARQL services?
    >>
    >> By the way, did you ever consider or even try on of the existing
    >> federated query engines like FedX, ANAPSID, HIBISCUS, etc. ? I'm
    >> wondering how those would work (if even scale) with ~100 sources like it
    >> looks to be the case in your query?
    >>
    >>> While using a query with a large number (100+) of remote sparql
    >> endpoints,
    >>> using the SERVICE keyword for a federated query, I have noticed that 
Jena
    >>> keeps waiting in the queue for slow responses to finish up before
    >>> proceeding to the next node.
    >>>
    >>> Would it not be a good idea to make SERVICE a thread to speed up the
    >>> process in the query?
    >>>
    >>>
    >>
    >> --
    >> Lorenz Bühmann
    >> AKSW group, University of Leipzig
    >> Group: http://aksw.org - semantic web research center
    >>
    >>
    
    Hello Marco,
    
     
    
    
    
    Kind regards,
    Lorenz
    
    -- 
    Lorenz Bühmann
    AKSW group, University of Leipzig
    Group: http://aksw.org - semantic web research center
    
    




Reply via email to