[
https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661141#comment-16661141
]
ASF GitHub Bot commented on ANY23-396:
--------------------------------------
Github user asfgit closed the pull request at:
https://github.com/apache/any23/pull/122
> Overhaul WriterFactory API
> --------------------------
>
> Key: ANY23-396
> URL: https://issues.apache.org/jira/browse/ANY23-396
> Project: Apache Any23
> Issue Type: Improvement
> Components: core
> Affects Versions: 2.2
> Reporter: Jacek Grzebyta
> Assignee: Hans Brende
> Priority: Major
> Fix For: 2.3
>
>
> This issue began with Jacek's observation that, in Rover, it is impossible to
> specify a *delegating writer factory*, i.e., one that maps/filters/reduces
> the preliminary extraction output before passing it on to the final
> outputstream writer. Lack of this ability caused us to have to specify
> numerous configuration flags in Rover, e.g., "--notrivial", which filters the
> output of the extractor by removing trivial css triples prior to writing the
> triples to their final format. Many of these flags could simply be replaced
> by the ids of *delegating writer factories*, if we had such a capability. One
> added advantage of that would be that then, users could specify the *order*
> in which these modifications take place. E.g., adding a *logging* decorator
> could take place before or after the "notrivial" decorator has been applied
> (or both before *and* after!). Which? If we can, we should really let the
> user decide.
> The most obvious solution to this problem was to extend the {{WriterFactory}}
> interface with a new {{DelegatingWriterFactory}} interface that accepts an
> arbitrary {{TripleHandler}} rather than an {{OutputStream}} as input.
> In doing so, it was also necessary to deprecate a few methods in
> {{WriterFactory}} and un-deprecate them in an extending
> {{TripleWriterFactory}} class (which takes the place of {{WriterFactory}} by
> creating a {{TripleHandler}} from an {{OutputStream}}). This deprecation was
> actually not too painful, first, because some of the methods were redundant
> in the first place (e.g., {{getMimeType()}}), and second, because it
> presented us with a perfect opportunity to add some much-needed improvements
> to the new interface.
> The biggest improvement is the addition of {{Settings}} as a parameter to the
> {{TripleHandler}} constructor, which will allow users to configure writers as
> they see fit, rather than forcing, e.g., {{prettyprint=true}} on them.
> ANY23-388 perfectly illustrates this current lack of configuration ability.
> And we fixed that issue by simply giving users {{protected}} access to the
> underlying {{RDFWriter}} instances so that they could configure them
> manually. However, in hindsight, this was a bad idea, as it could lead to
> backwards compatibility issues down the line if we decide to change the
> underlying implementation of {{RDFWriterTripleHandler}} instances. Luckily,
> the solution to ANY23-388 was only implemented recently and is still only
> present in the snapshot version of Any23. In my PR, I've removed that hack
> and replaced it with {{Settings}}, which is extensible ad infinitum and won't
> pose the same threat to backwards compatibility.
> Another improvement is the removal of RDF4J classes from the public
> WriterFactory API. (I replaced {{RDFFormat}} with our own {{TripleFormat}}
> class.) As I noted in my PR, it's probably better for us to use our own
> classes in public-facing interfaces rather than RDF4J's so that we can
> maintain stability in the event that RDF4J changes their API, or (heaven
> forbid) ceases to exist, or we simply want to modify the implementation. A
> good rule of thumb for us would probably be to limit usage of RDF4J in our
> public-facing API to the ubiquitous interfaces found in the
> {{org.eclipse.rdf4j:rdf4j-model}} artifact (e.g. {{IRI}} and {{Literal}}),
> since removing those would be virtually impossible without enormous backwards
> compatibility issues.
> Since this PR is quite large and there are a multitude of new classes and new
> behaviors (while managing to remain fully backwards-compatible with previous
> behavior), I'm looking for feedback! Please comment with any concerns,
> questions, or suggestions you have for improvement.
> PR can be viewed here: https://github.com/apache/any23/pull/122
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)