[GitHub] [jena] AtesComp opened a new issue, #1296: StreamRDFWriter getWriterStream()

GitBox Sat, 07 May 2022 16:15:38 -0700


AtesComp opened a new issue, #1296:
URL: https://github.com/apache/jena/issues/1296


   ### StreamRDFWriter Class
   Thje following issue came up using the Maven related release for Jena ARQ 
[4.4.0](https://mvnrepository.com/artifact/org.apache.jena/jena-arq/4.4.0).  I 
see it was just updated to 
[4.5.0](https://mvnrepository.com/artifact/org.apache.jena/jena-arq/4.5.0).
   
   In the StreamRDFWriter class, calling:
   `public static StreamRDF getWriterStream(OutputStream output, RDFFormat 
format)`
   causes a hung / lock up condition.  However, calling:
   `public static StreamRDF getWriterStream(OutputStream output, RDFFormat 
format, Context context)`
   with a null context does not hang the process.
   ...at lease in the application I've developed as an extension to OpenRefine. 
 See [RDF Transform](https://github.com/AtesComp/rdf-transform).
   
   Reviewing the code doesn't appear to reveal any issue as 
`getWriterStream(output, format)` simply calls `getWriterStream(output, format, 
null)`.  Very odd.  Perhaps a test pattern can help.
   
   Additionally, there are some comment issues and / or possible code 
corrections for these functions. For:
   `getWriterStream(OutputStream output, RDFFormat format, Context context)`
   the comments declare:
   > @return         StreamRDF, or null if format is not registered for 
streaming.
   
   No mention of exceptions. However, the code clearly throws an exception:
   >        if ( x == null )
   >            throw new RiotException("Failed to find a writer factory for 
"+format) ;
   
   As documented, a `return null;` would be enough.
   ### StreamRDF... Classes
   Some light humor...
   Why are these `StreamRDF...` classes in `...riot/system/` and not in 
`.../riot/writer/stream/`?
   And what's up with `...riot/system/stream` only holding `Locator...` classes?
   ### Related Documentation
   There is a lot of good resource material for RDFStream that could use some 
attention. The documentation on RIOT streaming (see [Working with RDF Streams 
in Apache Jena](https://jena.apache.org/documentation/io/streaming-io.html)) 
needs some luv'n to document the access to and use of the various stream 
classes. Particularly, the use of RDFFormat vs Lang (RDFFormat seems to be the 
new hotness).
   
   Most of the documentation is centered around using datasets, models, and 
graphs.  Far enough.  However, there are exigent use cases for processing large 
RDF datasets where the "pretty" printers just don't scale...as documented.  An 
iterative, streaming service is needed without first loading up a structure 
(i.e., duplicating the data) whether "in memory" or "persistent".  Sequentially 
reading in non-RDF data, processing discreet units to an RDF compliant form, 
and writing (preferably in `BLOCKS` form) directly to an RDF file (or to a 
repository) is more performant...even if there is some duplicative results.
   
   Hmmm, the C coded [Serd](https://github.com/drobilla/serd) library seems to 
be very performant, small, and converts to several formats.  Could the code be 
reviewed and converted to Java to help speed this kind of processing?
   ### Conclusion
   Class issues...documentation issues...a little frustration. I do plan on 
spending some time contributing to this effort...at least the documentation 
part.
   
   Thanks for Jena.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [jena] AtesComp opened a new issue, #1296: StreamRDFWriter getWriterStream()

Reply via email to