Aklakan opened a new issue, #1477: URL: https://github.com/apache/jena/issues/1477
### Version 4.6.0-SNAPSHOT ### Feature Issues: - Closing the iterator's returned by AsyncParser does not abort the parsing process. In fact, repeatedly abandoning iterators will cause parsing threads to silently pile up. - AsyncParser's default chunk size of 100K tuples introduces a long delay unsuitable for content probing - The EltStreamRDF is private. As mentioned in [https://issues.apache.org/jira/browse/JENA-2309] those events would useful in an hadoop/spark setting to scan for prefixes, thereby stopping the parser once only data is seen anymore. I am about to create a PR with the following improvements: - Changed AsyncParser API to return IteratorCloseables whose close() method actually cancels parsing. - Added public E**v**tStreamRDF interface from parsing events with the existing private E**l**tStreamRDF as the internal data object. The naming is up for discussion :) - Added a Builder that gives control over chunk and queue sizes: `AsyncParserNew.Builder.of(in, Lang.TRIG, null).setChunkSize(100).asyncParseQuads();`. The builder also has `asyncParseIterator` which returns `IteratorCloseable<EvtStreamRDF>`. - If a parser fails then all remaining parsers are still started with a destination in 'interrupted state' in order for them to close their resources. ### Are you interested in contributing a solution yourself? Yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
