Aklakan opened a new issue, #1477:
URL: https://github.com/apache/jena/issues/1477

   ### Version
   
   4.6.0-SNAPSHOT
   
   ### Feature
   
   Issues:
   - Closing the iterator's returned by AsyncParser does not abort the parsing 
process. In fact, repeatedly abandoning iterators will cause parsing threads to 
silently pile up.
   - AsyncParser's default chunk size of 100K tuples introduces a long delay 
unsuitable for content probing
   - The EltStreamRDF is private. As mentioned in 
[https://issues.apache.org/jira/browse/JENA-2309] those events would useful in 
an hadoop/spark setting to scan for prefixes, thereby stopping the parser once 
only data is seen anymore.
   
   I am about to create a PR with the following improvements:
   - Changed AsyncParser API to return IteratorCloseables whose close() method 
actually cancels parsing.
   - Added public E**v**tStreamRDF interface from parsing events with the 
existing private E**l**tStreamRDF as the internal data object. The naming is up 
for discussion :)
   - Added a Builder that gives control over chunk and queue sizes: 
`AsyncParserNew.Builder.of(in, Lang.TRIG, 
null).setChunkSize(100).asyncParseQuads();`. The builder also has 
`asyncParseIterator` which returns `IteratorCloseable<EvtStreamRDF>`.
   - If a parser fails then all remaining parsers are still started with a 
destination in 'interrupted state' in order for them to close their resources.
   
   
   ### Are you interested in contributing a solution yourself?
   
   Yes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to