[ 
https://issues.apache.org/jira/browse/JENA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001132#comment-14001132
 ] 

Andy Seaborne commented on JENA-675:
------------------------------------

A first experiment add {{OutputPolicy}} directly to the 
{{WriterGraphRIOT}}/{{WriterDatasetRIOT}} interface as a general capability for 
writers, didn't work out very nicely.  I found that I was seeing changes across 
all output in RIOT but much of it is not useful; I feel it is placing 
responsibility for setup (often dictated by the standard for writing a single 
complete graph) away from the language itself.

An alternative approach is to have the capability for details output setup 
specifically to write fragments of graphs (e.g. optionally not repeat the 
prefixes, carry the NodeToLabel map across, not work on whole graph pretty 
formatting).

One example : writing a graph needs wring prefixes, but writing a fragment of 
graph,might not.

There already are output formats that do fragment based wring for whole graphs; 
{{RDFFormat.TURTLE_FLAT}} and {{RDFFormat.TURTLE_BLOCKS}} (similar TriG forms) 
using {{WriterStreamRDF???}} classes. They write the preamble then write 
fragments.

N-Quads and N-Triples use {{WriterStreamRDFTuples}}. In this case, there is no 
preamble.

Would this be a reasonable place to build the output functions needed for 
RDF-hadoop?

Related: instead of a {{Graph}} for buffering, what about a {{StreamRDF}}? 
(c.f. {{WriterStreamRDFBatched}} with a policy of same-subject, same-graph; 
there is scope for lots of different buffering strategies here including size 
based clumping, and common subjects within clump).


> Add and use a WriterProfile API
> -------------------------------
>
>                 Key: JENA-675
>                 URL: https://issues.apache.org/jira/browse/JENA-675
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ, RIOT
>    Affects Versions: Jena 2.11.1
>            Reporter: Rob Vesse
>            Assignee: Andy Seaborne
>
> Currently we have a {{ParserProfile}} which allows specifying certain aspects 
> of input behaviour such as Prologue and Label to Node ID
> However we don't have a corresponding {{WriterProfile}} API, we actually have 
> a class called {{OutputProfile}} but this is never actually used anywhere.
> This would be particularly useful for languages that rely on the 
> {{NodeFormatter}} API where we can find comments such as the following:
> {quote}
> // Replace with a single "OutputPolicy"
> {quote}
> The lack of this API means we don't provide users any ability to do things 
> like control how blank node IDs are allocated.  And existing functionality we 
> do give them like providing a set of namespaces and base URI to use for 
> serialisation needs to be folded into this API.
> I know of two places where this is currently causing issues:
> * In the incoming Hadoop RDF Tools code (JENA-666) many output formats 
> currently mangle the data when outputting blank nodes because they can't 
> share a {{NodeToLabel}} instance over multiple writer runs.
> * In an internal bug at Cray we're seeing a situation where different code 
> paths lead to different presentation of blank nodes and we have no APIs to 
> allow us to control this presentation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to