[
https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658444#comment-16658444
]
ASF GitHub Bot commented on ANY23-396:
--------------------------------------
Github user HansBrende commented on the issue:
https://github.com/apache/any23/pull/122
@lewismc the reason I ask is that my **cleanup item 4** allows me to
specify a new method in `TripleWriter` which accepts a group as a `Resource`
rather than as an `IRI`. That's what I've done in my latest cleanup commit.
I'll leave this PR open for at least another day before merging to master
just in case anyone comes up with any further comments or concerns.
> Overhaul WriterFactory API
> --------------------------
>
> Key: ANY23-396
> URL: https://issues.apache.org/jira/browse/ANY23-396
> Project: Apache Any23
> Issue Type: Improvement
> Components: core
> Affects Versions: 2.2
> Reporter: Jacek Grzebyta
> Assignee: Hans Brende
> Priority: Major
> Fix For: 2.3
>
>
> This issue began with Jacek's observation that, in Rover, it is impossible to
> specify a *delegating writer factory*, i.e., one that maps/filters/reduces
> the preliminary extraction output before passing it on to the final
> outputstream writer. Lack of this ability caused us to have to specify
> numerous configuration flags in Rover, e.g., "--notrivial", which filters the
> output of the extractor by removing trivial css triples prior to writing the
> triples to their final format. Many of these flags could simply be replaced
> by the ids of *delegating writer factories*, if we had such a capability. One
> added advantage of that would be that then, users could specify the *order*
> in which these modifications take place. E.g., adding a *logging* decorator
> could take place before or after the "notrivial" decorator has been applied
> (or both before *and* after!). Which? If we can, we should really let the
> user decide.
> The most obvious solution to this problem was to extend the {{WriterFactory}}
> interface with a new {{DelegatingWriterFactory}} interface that accepts an
> arbitrary {{TripleHandler}} rather than an {{OutputStream}} as input.
> In doing so, it was also necessary to deprecate a few methods in
> {{WriterFactory}} and un-deprecate them in an extending
> {{TripleWriterFactory}} class (which takes the place of {{WriterFactory}} by
> creating a {{TripleHandler}} from an {{OutputStream}}). This deprecation was
> actually not too painful, first, because some of the methods were redundant
> in the first place (e.g., {{getMimeType()}}), and second, because it
> presented us with a perfect opportunity to add some much-needed improvements
> to the new interface.
> The biggest improvement is the addition of {{Settings}} as a parameter to the
> {{TripleHandler}} constructor, which will allow users to configure writers as
> they see fit, rather than forcing, e.g., {{prettyprint=true}} on them.
> ANY23-388 perfectly illustrates this current lack of configuration ability.
> And we fixed that issue by simply giving users {{protected}} access to the
> underlying {{RDFWriter}} instances so that they could configure them
> manually. However, in hindsight, this was a bad idea, as it could lead to
> backwards compatibility issues down the line if we decide to change the
> underlying implementation of {{RDFWriterTripleHandler}} instances. Luckily,
> the solution to ANY23-388 was only implemented recently and is still only
> present in the snapshot version of Any23. In my PR, I've removed that hack
> and replaced it with {{Settings}}, which is extensible ad infinitum and won't
> pose the same threat to backwards compatibility.
> Another improvement is the removal of RDF4J classes from the public
> WriterFactory API. (I replaced {{RDFFormat}} with our own {{TripleFormat}}
> class.) As I noted in my PR, it's probably better for us to use our own
> classes in public-facing interfaces rather than RDF4J's so that we can
> maintain stability in the event that RDF4J changes their API, or (heaven
> forbid) ceases to exist, or we simply want to modify the implementation. A
> good rule of thumb for us would probably be to limit usage of RDF4J in our
> public-facing API to the ubiquitous interfaces found in the
> {{org.eclipse.rdf4j:rdf4j-model}} artifact (e.g. {{IRI}} and {{Literal}}),
> since removing those would be virtually impossible without enormous backwards
> compatibility issues.
> Since this PR is quite large and there are a multitude of new classes and new
> behaviors (while managing to remain fully backwards-compatible with previous
> behavior), I'm looking for feedback! Please comment with any concerns,
> questions, or suggestions you have for improvement.
> PR can be viewed here: https://github.com/apache/any23/pull/122
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)