AtesComp opened a new issue, #1296: URL: https://github.com/apache/jena/issues/1296
### StreamRDFWriter Class Thje following issue came up using the Maven related release for Jena ARQ [4.4.0](https://mvnrepository.com/artifact/org.apache.jena/jena-arq/4.4.0). I see it was just updated to [4.5.0](https://mvnrepository.com/artifact/org.apache.jena/jena-arq/4.5.0). In the StreamRDFWriter class, calling: `public static StreamRDF getWriterStream(OutputStream output, RDFFormat format)` causes a hung / lock up condition. However, calling: `public static StreamRDF getWriterStream(OutputStream output, RDFFormat format, Context context)` with a null context does not hang the process. ...at lease in the application I've developed as an extension to OpenRefine. See [RDF Transform](https://github.com/AtesComp/rdf-transform). Reviewing the code doesn't appear to reveal any issue as `getWriterStream(output, format)` simply calls `getWriterStream(output, format, null)`. Very odd. Perhaps a test pattern can help. Additionally, there are some comment issues and / or possible code corrections for these functions. For: `getWriterStream(OutputStream output, RDFFormat format, Context context)` the comments declare: > @return StreamRDF, or null if format is not registered for streaming. No mention of exceptions. However, the code clearly throws an exception: > if ( x == null ) > throw new RiotException("Failed to find a writer factory for "+format) ; As documented, a `return null;` would be enough. ### StreamRDF... Classes Some light humor... Why are these `StreamRDF...` classes in `...riot/system/` and not in `.../riot/writer/stream/`? And what's up with `...riot/system/stream` only holding `Locator...` classes? ### Related Documentation There is a lot of good resource material for RDFStream that could use some attention. The documentation on RIOT streaming (see [Working with RDF Streams in Apache Jena](https://jena.apache.org/documentation/io/streaming-io.html)) needs some luv'n to document the access to and use of the various stream classes. Particularly, the use of RDFFormat vs Lang (RDFFormat seems to be the new hotness). Most of the documentation is centered around using datasets, models, and graphs. Far enough. However, there are exigent use cases for processing large RDF datasets where the "pretty" printers just don't scale...as documented. An iterative, streaming service is needed without first loading up a structure (i.e., duplicating the data) whether "in memory" or "persistent". Sequentially reading in non-RDF data, processing discreet units to an RDF compliant form, and writing (preferably in `BLOCKS` form) directly to an RDF file (or to a repository) is more performant...even if there is some duplicative results. Hmmm, the C coded [Serd](https://github.com/drobilla/serd) library seems to be very performant, small, and converts to several formats. Could the code be reviewed and converted to Java to help speed this kind of processing? ### Conclusion Class issues...documentation issues...a little frustration. I do plan on spending some time contributing to this effort...at least the documentation part. Thanks for Jena. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
