[ https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657268#comment-16657268 ]
ASF GitHub Bot commented on ANY23-396: -------------------------------------- Github user lewismc commented on the issue: https://github.com/apache/any23/pull/122 I've reviewed this patch a few times now and I think it is looking great. The above thread is very helpful to see how these API's changes cam about so thank you for the in-depth thought process you've engaged in. It makes a huge difference. Pull locally, tested, evaluated again.... I am +1 to merge. > Overhaul WriterFactory API > -------------------------- > > Key: ANY23-396 > URL: https://issues.apache.org/jira/browse/ANY23-396 > Project: Apache Any23 > Issue Type: Improvement > Components: core > Affects Versions: 2.2 > Reporter: Jacek Grzebyta > Assignee: Hans Brende > Priority: Major > Fix For: 2.3 > > > This issue began with Jacek's observation that, in Rover, it is impossible to > specify a *delegating writer factory*, i.e., one that maps/filters/reduces > the preliminary extraction output before passing it on to the final > outputstream writer. Lack of this ability caused us to have to specify > numerous configuration flags in Rover, e.g., "--notrivial", which filters the > output of the extractor by removing trivial css triples prior to writing the > triples to their final format. Many of these flags could simply be replaced > by the ids of *delegating writer factories*, if we had such a capability. One > added advantage of that would be that then, users could specify the *order* > in which these modifications take place. E.g., adding a *logging* decorator > could take place before or after the "notrivial" decorator has been applied > (or both before *and* after!). Which? If we can, we should really let the > user decide. > The most obvious solution to this problem was to extend the {{WriterFactory}} > interface with a new {{DelegatingWriterFactory}} interface that accepts an > arbitrary {{TripleHandler}} rather than an {{OutputStream}} as input. > In doing so, it was also necessary to deprecate a few methods in > {{WriterFactory}} and un-deprecate them in an extending > {{TripleWriterFactory}} class (which takes the place of {{WriterFactory}} by > creating a {{TripleHandler}} from an {{OutputStream}}). This deprecation was > actually not too painful, first, because some of the methods were redundant > in the first place (e.g., {{getMimeType()}}), and second, because it > presented us with a perfect opportunity to add some much-needed improvements > to the new interface. > The biggest improvement is the addition of {{Settings}} as a parameter to the > {{TripleHandler}} constructor, which will allow users to configure writers as > they see fit, rather than forcing, e.g., {{prettyprint=true}} on them. > ANY23-388 perfectly illustrates this current lack of configuration ability. > And we fixed that issue by simply giving users {{protected}} access to the > underlying {{RDFWriter}} instances so that they could configure them > manually. However, in hindsight, this was a bad idea, as it could lead to > backwards compatibility issues down the line if we decide to change the > underlying implementation of {{RDFWriterTripleHandler}} instances. Luckily, > the solution to ANY23-388 was only implemented recently and is still only > present in the snapshot version of Any23. In my PR, I've removed that hack > and replaced it with {{Settings}}, which is extensible ad infinitum and won't > pose the same threat to backwards compatibility. > Another improvement is the removal of RDF4J classes from the public > WriterFactory API. (I replaced {{RDFFormat}} with our own {{TripleFormat}} > class.) As I noted in my PR, it's probably better for us to use our own > classes in public-facing interfaces rather than RDF4J's so that we can > maintain stability in the event that RDF4J changes their API, or (heaven > forbid) ceases to exist, or we simply want to modify the implementation. A > good rule of thumb for us would probably be to limit usage of RDF4J in our > public-facing API to the ubiquitous interfaces found in the > {{org.eclipse.rdf4j:rdf4j-model}} artifact (e.g. {{IRI}} and {{Literal}}), > since removing those would be virtually impossible without enormous backwards > compatibility issues. > Since this PR is quite large and there are a multitude of new classes and new > behaviors (while managing to remain fully backwards-compatible with previous > behavior), I'm looking for feedback! Please comment with any concerns, > questions, or suggestions you have for improvement. > PR can be viewed here: https://github.com/apache/any23/pull/122 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)