[jira] [Commented] (JENA-2173) Add asynchronous parsing

Rob Vesse (Jira) Tue, 19 Oct 2021 02:14:07 -0700


    [ 
https://issues.apache.org/jira/browse/JENA-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430431#comment-17430431
 ]


Rob Vesse commented on JENA-2173:
---------------------------------

FWIW this was already possible with existing ARQ machinery, see 
http://jena.apache.org/documentation/javadoc/arq/org/apache/jena/riot/lang/PipedRDFIterator.html
 and it's related classes

e.g.

{code}
        final PipedRDFIterator<Triple> iter = new PipedRDFIterator<Triple>();
        // Create a runnable for the parser
        Runnable runParser = new Runnable() {
            @Override
            public void run() {
                PipedRDFStream<Triple> stream = new PipedTriplesStream(iter);
                //@formatter:off
                RDFParser parser = RDFParserBuilder.create()
                        .base(file != null ? file.toURI().toString() : null)
                        .lang(lang)
                        .source(input)
                        .build();
                //@formatter:on
                parser.parse(stream);
                LOGGER.info("Parsing completed OK");
            }
        };

      // Submit the runnable for execution via ExecutorService/start a manual 
thread

     // Consume iterator
     while (iter.hasNext()) {
       // Do something
     }
{code}

Although this embodies a slightly different usage pattern, creating an 
buffering iterator between the parser and the caller thread that is then used 
by the caller, as opposed to explicitly pushing batches back to the caller in 
your new code.

And I totally agree the overheads outweigh the benefits in many cases

> Add asynchronous parsing
> ------------------------
>
>                 Key: JENA-2173
>                 URL: https://issues.apache.org/jira/browse/JENA-2173
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: RIOT
>    Affects Versions: Jena 4.2.0
>            Reporter: Andy Seaborne
>            Assignee: Andy Seaborne
>            Priority: Major
>             Fix For: Jena 4.3.0
>
>
> Add code to parser on a separate thread and send batching of parsed items to 
> the caller thread for further processing.
> This is only beneficial in certain circumstances because there is overhead in 
> setup and in the passing of data between threads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (JENA-2173) Add asynchronous parsing

Reply via email to