[ 
https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657597#comment-16657597
 ] 

ASF GitHub Bot commented on ANY23-396:
--------------------------------------

Github user HansBrende commented on the issue:

    https://github.com/apache/any23/pull/122
  
    @lewismc , great to hear!
    
    After I do a bit of last-minute cleanup, I will merge this PR.
    
    Cleanup item 1: I'm renaming `TripleFormat.ExtendedCapabilities` to 
`TripleFormat.FineCapabilities`, as I think the first name is a slightly 
misleading. 
    
    Cleanup item 2: For now, I'm removing the newly added support for 
configuring a writer's charset via `Settings` (although we could take another 
look at doing this in a future issue), for 3 reasons:  
    
    1. Some XML-based writers hard-code a "encoding=utf8" declaration which 
might then conflict with user-supplied charset and produce an invalid document.
    2. If the user sets a writer's charset to US-ASCII or similar, that could 
create a problem if the writer doesn't support escaping non-ascii characters. 
(To my knowledge, only the `NTriplesWriter` and `NQuadsWriter` support this.)
    3. The default charset for every existing writer is already UTF-8, and I 
can't think of a good reason to support anything else.


> Overhaul WriterFactory API
> --------------------------
>
>                 Key: ANY23-396
>                 URL: https://issues.apache.org/jira/browse/ANY23-396
>             Project: Apache Any23
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 2.2
>            Reporter: Jacek Grzebyta
>            Assignee: Hans Brende
>            Priority: Major
>             Fix For: 2.3
>
>
> This issue began with Jacek's observation that, in Rover, it is impossible to 
> specify a *delegating writer factory*, i.e., one that maps/filters/reduces 
> the preliminary extraction output before passing it on to the final 
> outputstream writer. Lack of this ability caused us to have to specify 
> numerous configuration flags in Rover, e.g., "--notrivial", which filters the 
> output of the extractor by removing trivial css triples prior to writing the 
> triples to their final format. Many of these flags could simply be replaced 
> by the ids of *delegating writer factories*, if we had such a capability. One 
> added advantage of that would be that then, users could specify the *order* 
> in which these modifications take place. E.g., adding a *logging* decorator 
> could take place before or after the "notrivial" decorator has been applied 
> (or both before *and* after!). Which? If we can, we should really let the 
> user decide. 
> The most obvious solution to this problem was to extend the {{WriterFactory}} 
> interface with a new {{DelegatingWriterFactory}} interface that accepts an 
> arbitrary {{TripleHandler}} rather than an {{OutputStream}} as input. 
> In doing so, it was also necessary to deprecate a few methods in 
> {{WriterFactory}} and un-deprecate them in an extending 
> {{TripleWriterFactory}} class (which takes the place of {{WriterFactory}} by 
> creating a {{TripleHandler}} from an {{OutputStream}}). This deprecation was 
> actually not too painful, first, because some of the methods were redundant 
> in the first place (e.g., {{getMimeType()}}), and second, because it 
> presented us with a perfect opportunity to add some much-needed improvements 
> to the new interface.
> The biggest improvement is the addition of {{Settings}} as a parameter to the 
> {{TripleHandler}} constructor, which will allow users to configure writers as 
> they see fit, rather than forcing, e.g., {{prettyprint=true}} on them.
> ANY23-388 perfectly illustrates this current lack of configuration ability. 
> And we fixed that issue by simply giving users {{protected}} access to the 
> underlying {{RDFWriter}} instances so that they could configure them 
> manually. However, in hindsight, this was a bad idea, as it could lead to 
> backwards compatibility issues down the line if we decide to change the 
> underlying implementation of {{RDFWriterTripleHandler}} instances. Luckily, 
> the solution to ANY23-388 was only implemented recently and is still only 
> present in the snapshot version of Any23. In my PR, I've removed that hack 
> and replaced it with {{Settings}}, which is extensible ad infinitum and won't 
> pose the same threat to backwards compatibility. 
> Another improvement is the removal of RDF4J classes from the public 
> WriterFactory API. (I replaced {{RDFFormat}} with our own {{TripleFormat}} 
> class.) As I noted in my PR, it's probably better for us to use our own 
> classes in public-facing interfaces rather than RDF4J's so that we can 
> maintain stability in the event that RDF4J changes their API, or (heaven 
> forbid) ceases to exist, or we simply want to modify the implementation. A 
> good rule of thumb for us would probably be to limit usage of RDF4J in our 
> public-facing API to the ubiquitous interfaces found in the 
> {{org.eclipse.rdf4j:rdf4j-model}} artifact (e.g. {{IRI}} and {{Literal}}), 
> since removing those would be virtually impossible without enormous backwards 
> compatibility issues.
> Since this PR is quite large and there are a multitude of new classes and new 
> behaviors (while managing to remain fully backwards-compatible with previous 
> behavior), I'm looking for feedback! Please comment with any concerns, 
> questions, or suggestions you have for improvement. 
> PR can be viewed here: https://github.com/apache/any23/pull/122
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to