[
https://issues.apache.org/jira/browse/JENA-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430431#comment-17430431
]
Rob Vesse commented on JENA-2173:
---------------------------------
FWIW this was already possible with existing ARQ machinery, see
http://jena.apache.org/documentation/javadoc/arq/org/apache/jena/riot/lang/PipedRDFIterator.html
and it's related classes
e.g.
{code}
final PipedRDFIterator<Triple> iter = new PipedRDFIterator<Triple>();
// Create a runnable for the parser
Runnable runParser = new Runnable() {
@Override
public void run() {
PipedRDFStream<Triple> stream = new PipedTriplesStream(iter);
//@formatter:off
RDFParser parser = RDFParserBuilder.create()
.base(file != null ? file.toURI().toString() : null)
.lang(lang)
.source(input)
.build();
//@formatter:on
parser.parse(stream);
LOGGER.info("Parsing completed OK");
}
};
// Submit the runnable for execution via ExecutorService/start a manual
thread
// Consume iterator
while (iter.hasNext()) {
// Do something
}
{code}
Although this embodies a slightly different usage pattern, creating an
buffering iterator between the parser and the caller thread that is then used
by the caller, as opposed to explicitly pushing batches back to the caller in
your new code.
And I totally agree the overheads outweigh the benefits in many cases
> Add asynchronous parsing
> ------------------------
>
> Key: JENA-2173
> URL: https://issues.apache.org/jira/browse/JENA-2173
> Project: Apache Jena
> Issue Type: Improvement
> Components: RIOT
> Affects Versions: Jena 4.2.0
> Reporter: Andy Seaborne
> Assignee: Andy Seaborne
> Priority: Major
> Fix For: Jena 4.3.0
>
>
> Add code to parser on a separate thread and send batching of parsed items to
> the caller thread for further processing.
> This is only beneficial in certain circumstances because there is overhead in
> setup and in the passing of data between threads.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)