Re: DOM API design (was: Problems with DOM parts)

Peter Becker Sun, 27 May 2001 02:59:49 -0700
Hello Thierry,

first some small comments, then I try to (a) point our differences down 
to one small aspect of the language semantics and (b) show you possible 
solutions to the problem. I recommend reading all before answering ;-) 
(I always tend to reply inline before reading the whole -- and often fix 
my first comments after reading the later parts).

Thierry Kormann wrote:

> On Friday 25 May 2001 04:24, Peter Becker wrote:
> 
> Hello,
> 
>>> I think Thierry will ellaborate on the
>>> reason-why of the Transcoder API, but note that
>>> we were aware of the trade-offs and that (right or wrong ?), we decided
>>> to go with a generic interface that would accomodate for various types
>>> of Transcoders. Our goal was to create a simple API that would be
>>> easy to learn and understand.
>> 
>> The API is ok, using the classes is even worse than the usual Java ;-)
> 
> 
> When I started thinking of the design of the transcoder module, my main goal 
> was to try to provide a common API for all transcoders. In fact, we 
> identified two simple solutions.
> 
> 1. Each transcoder has its own and custom API. The main advantages is strong 
> typing for the various parameters. Errors will be also detected at the 
> compile time.
> 
> 2. Try to design a common API for all transcoders. The main advantage is that 
> users have to learn one API and then can use all transcoders available in 
> Batik (switching them, extending them easily...).
> 
> What we have done is a mix of both solutions. You can use the generic API as 
> a first shot then switch to the custom API of a particular transcoder if you 
> need it (BTW only the generic API is documented - custom API are documented 
> by their own).
> 
> If you think of a generic API, there is no magic solution. Similar to the 
> RenderingHints class of the 2D API, errors (bad type for parameters...) can 
> only be detected at runtime.

A generic API makes IMHO only sense if it can be implemented in a 
generic manner. Your way seems a bad compromise to me, and often 
compromises are worse than both extremes ;-)

>> That's my point. Not only that compile-time type checking has a number
>> of advantages in itself (e.g. errors are found even if the code is not
>> executed in your tests), it was mainly the documentation aspect that
>> annoyed me. If a parameter has a specific type I expect to be able to
>> use this type, not descendants (BTW: this is IMHO the real reason why
>> the lack of parametrized classes in Java is a Bad Thing (tm): I can't
>> specifiy that I want a list of MyType objects). And although I have to
>> admit that my favoured way of finding the right methods and parameters
>> is JBuilders CodeInsight feature, ...
> 
> 
> I agree that compile-time errors are best. If you need strong typing, I 
> invite you to use the custom API of the transcoder you need.
> 
> I agree that the documentation may be not up-to-date but according to the 
> example, you should have guess no? :) If you have time to improve the 
> documentation or contribute some FAQs entries, feel free to contribute...

I'll try to submit some patches instead ;-)

>> "... For example, the image transcoders accept an SVG
>> org.w3c.dom.Document, ... as an input ..."
> 
> 
> OK the sentence is not good. The implicit thing is that an SVG 
> org.w3c.dom.Document is a org.w3c.dom.svg.SVGDocument.

That's your definition, not mine ;-)

> Sorry about that.
> 
>> I still can't see where the trade-off is. You state in multiple
>> locations to support the org.w3c.dom... interfaces but in fact you just
>> support your own implementation. If your trade-off is between
>> flexibility (the W3C interface) and easy of implementing (just your
>> implementation) then you didn't document it and lead me to wring
>> assumptions.
> 
> 
> The trade-off is that the transcoder API is *not* dedicated to a particular 
> transcoder (not only for the ImageTranscoder) though it seems that it's the 
> one people commonly use. You can take a look at the SVGTranscoder (pretty 
> printer) or the WMF transcoder... - they work fine a generic DOM 
> implementation.

Then _they_ implement the interface, but the ImageTranscoder doesn't.

>>> org.apache.batik.util.DOMUtilities.deepCloneDocument).
>> 
>> Ahh -- that seems to be what I tried to find yesterday. Can you tell me
>> how I should have found this? Even now that I know where it is I can't
>> see this. And again: the return value of this method is the interface,
>> not the implementation but if I understand the interface right it means:
>> convert a foreign document into our format. The aspect "foreign" is
>> somehow covered by the fact that you want a DomImplementation object but
>> why don't you state that you return SVGDocument? Keeping parameters
>> generic is useful (if you can hold your promise) but I can't see why you
>> can't state exactly what you return since you don't override a method here.
> 
> 
> We simply do not return an SVGDocument because this method is not dedicated 
> to transform a Document into an SVGDocument. We clone a Document using 
> another DOMImplementation. This method can be used to transform a Document 
> into an SVGDocument if you specify the Batik DOM Implementation - but you can 
> also use this method to transform an SVGDocument into a xerces Document for 
> example.

Oops -- I got that one wrong, sorry. Of course the signature is correct 
this way -- my fault. But a small complaint for this function: the 
parameters are called p0 and p1 -- why not something like inputDocument 
and targetImplementation? This lazyness while typing usually causes 
problems later.

>> BTW: while talking about documentation -- check out
>> http://doc.trolltech.com on a really good example on how to write source
>> docu (just read some class description to know what I mean). This is the
>> way I try to do it (but usually I don't really achieve it due to the
>> common reasons. ;-) ) For the counterexample check most Java libraries,
>> e.g. AWT. Usually I can derive more information from the signature of a
>> method (which I can see after pressing the dot) than from the
>> description of the method. Since the classes themself are not described
>> either I gave up looking into this documentation unless I want to find
>> some hierarchical relations. And Literate Programming and AWT is quite
>> different, too: I really need some documentation from time to time.
> 
> 
> I know trolltech documentation. I agree that the documentation is nice but we 
> are actually out of time to write tutorials or detailed documentation. We are 
> trying to do our best both for the Batik code and its documentation. I tend 
> to agree that much time has been dedicated to the code rather than the 
> documentation :) We will try to improve that as soon as possible.

If you believe it or not: I mostly document _before_ I write an 
implementation. My documentation sometimes get out of sync when I change 
the behaviour of a method later, but fortunately this doesn't happen too 
often. And it happens usually only if I am hacking along and then the 
code needs refactoring anyway ;-) If you have a good design and try to 
do use good method and parameter names, you usually just need one or two 
sentences per methods. Writing these before I write code even helps me 
focussing on the task of implementing the code since I have to put the 
spec into words.

>> An API is a contract. You tell people what you want and what you give
>> back. I gave you what you claimed you wanted and you told me: "go away!
>> we don't like THIS". That's how I see it and I think it is a common view
>> with software engineers (although not with hackers). In fact I'd love to
>> have some way to state more like this -- I guess it is time for me to
>> learn Eiffel ;-)
> 
> 
> I learned Eiffel btw :) The problem is much more compicated than especially 
> when you are designing a toolkit. Genericity and strong typing are not good 
> friends generally (and more specifically in Java).

Since Java lacks parametrized classes.

[...]

>> Sorry if this sounds too academic and is somehow OT but switching from
>> C++/Qt to Java/Swing made me think a lot on programming languages and
>> now I rant more on Java than I ever did on C++ before ;-) At least I
>> _can_ write good code in C++ (if it only would have garbage collection).
>> I think when I finished the next release of my program I should recreate
>> my private site and put an article "Why Java is Evil" on it to avoid
>> writing too much OT stuff on lists like this ;-)
> 
> 
> Sorry to hear that. May be you need to learn Java the same time you have 
> studied C++ :) My feeling is that good programmers are doing good code what 
> ever the language is.

I don't believe in good Java code anymore ;-) When I started looking at 
Java code I thought it is pretty complicated. This has not changed in 
the last months when I took a closer look at Java, instead I tracked 
this down to a large number of design flaws in APIs and some significant 
design flaws in the language itself. If you think I am wrong I'd be 
happy if you can show me any really good Java code. With good I mean: 
easy to read and easy to extend/modify.

Enough ranting -- let's get down to the facts. First let us try to track 
down how our opinions differ. I think it is only the interpretation of a 
methods signature like this:

   public void transcode( Document document, String uri, 
TranscoderOutput output )

We probably both agree that defining the formal parameters this way 
means the method takes an instance of a class implementing 
org.w3c.Document as first parameter. We disagree on the quantifier to 
use for the class, I say it should be an instance of _any_ class that 
implements org.w3c.Document, you say it means just taking _some_ 
specific version.

IMHO using the existential (some) approach weakens the language in a way 
that is very dangerous. I'd like to proof that your interpretation is 
even wrong but unfortunately I can't find any clear definitions for this 
in either the Java Language Specification (2nd ed.) or the C++ bible 
(2nd ed.). But imagine someone claims to have written an SVG import 
filter for his program and when testing it you realize that he 
implemented only a small subset or maybe he even just checks for some 
specific SVG documents and loads his counterparts. Of course this is a 
kind of an SVG import but is it what you would have expected? I think 
the lack of a quantifier in both cases should be read as universal 
quantifier, i.e. "we take all org.w3c.Documents" and "we import all SVG 
documents". It is a pity that this is not addressed in the 
specifications of the languages -- but maybe I just miss it. If someone 
can point me to something like this I'd be grateful.

A point where you IMHO are really not spec compliant is 
AbstractDocument.importNode(..). Here you claim to implement the 
importNode(..) interface of org.w3c.Document. The W3C recommondation for 
DOM Level 2 says:

<cite 
url="http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/core.html#i-Document";>
For all nodes, importing a node creates a node object owned by the 
importing document, with attribute values identical to the source node's 
|nodeName| and |nodeType|, plus the attributes related to namespaces 
(|prefix|, |localName|, and |namespaceURI|).
</cite>

It says "For all nodes..." here, not "for some nodes...". And the FAQ says:

<cite url="http://www.w3.org/DOM/faq#ownerdoc";>

How can I copy a node or subtree from one document to another?
    DOM Level 2 defines an importNode() method that performs this
    operation. It is up to the implementation to do this in a standard
    way that works across implementations or in a more efficient way
    that uses knowledge of that implementation's data structures. If
    you're working with a Level 1 DOM, you have to copy the content
manually.

</cite>

Note the part "...works across implementations..." -- you assume in the 
method that you get instances of your Node implementation, so I'd say 
your claim to implement the interface (explicitely in the JavaDoc for 
this method) is plainly wrong.

Let's get more constructive...

You asked what you can do. OK -- I'll try to explain what I might do:

Version 1:
========
Use the generic API only when it is fully implemented (i.e. with the 
"all" quantifier on the class/interface type), use more specific APIs 
(and only them) otherwise.

You correctly stated that this introduces more APIs a user of your 
library has to know, but I _much_ prefer having to use more interfaces 
than having one interface that behaves different depending on the 
context. The latter defies IMHO much of the usability of an interface 
definition and I'd like to call it wrong (unfortunately I couldn't proof 
it). But it is not a good solution, so let's go further to...

Version 2:
========
Instead of telling the user to convert the document, just do it 
yourself. You do the RTTI anyway, so there is no additional overhead if 
you get the right implementation and if not the method will still work. 
Of course there is a hidden performance issue but this can be documented 
-- and in my case I just can't avoid the copy anyway, so I have to live 
with this performance drawback.

This would implement the full interface, my code would have worked with 
my first try and I'd have saved something like 10h now (including this 
discussion, which is of course unfair *g*). But maybe we can even do better:

Version 3:
========
Try to avoid using specific parts of your implementation. If the input 
can be converted automatically into the form you want, then there _is_ 
some way to use the generic interface to get the result. Track down the 
differences, avoid them where you can and if you can't avoid all 
problems, try to split the code at the deeper level. Of course this can 
have some serious impact on the performance since it might involve RTTI 
or other checks in lower methods. If this happens, implement the generic 
method interface in a generic way and add this to your implementation:

   public void transcode( SVGOMDocument document, String uri, 
TranscoderOutput output )

which implements the existing method. If I understand Java correctly, 
this is valid overloading and the compiler will automatically prefer the 
more specific implementation, i.e. if someone gives an SVGOMDocument, 
the additional method will be called. Drawback is of course that this 
adds redundancy to the code and thus additional maintenance overhead. 
But I think your library addresses a large number of clients and so you 
should compare your inconvinience with the inconvinience of dozens, 
maybe hundreds of other people. If you don't want to -- that's fair 
since it is your code that you write for free. But don't expect your 
customers to be happy with your decision ;-)

================================

Let's try to implement version 2, which should be really as easy as this:

<code fragment>                            
protected void transcode(Document document,                                
                            String uri,
                            TranscoderOutput output)
           throws TranscoderException {

       SVGDocument svgDoc;
       if (!(document instanceof SVGOMDocument)) {
           svgDoc = (SVGDocument)DOMUtilities.deepCloneDocument( 
document, new SVGDOMImplementation() );
       }
       else
       {
           svgDoc = (SVGDocument)document;
       }
       SVGSVGElement root = svgDoc.getRootElement();                     
     

</code fragement>

Unfortunately the root element of the deepCloned document seems _not_ to 
be a valid SVGSVGElement, this gives us again a nice runtime error: 
ClassCastException in line 519 of SVGOMDocument. Sorry -- my motivation 
is not high enough to debug this. I'll stick with my temp files instead 
-- not elegant but works ;-)

If you like some further comments on specific questions I'd love to help 
(just mail me directly) but unfortunately it seems to me that getting 
rid of your current problems will be more work than I want to invest. 
Whenever I try to solve or just workaround a problem related to the lack 
of type safety I find more of them :-( It seems to me that you are 
learning that static type safety is more than just a nice thing to have 
the hard way. Sorry if this sounds arrogant but I had my lessons before ;-)

BTW: one of the things I don't like in Java is that they just can't do 
things straightforward. Why do I always have to use Factory Methods or 
like in your case additional utilities? Instead of using the 
DOMUtilities helper class, exactly the same result could be achieved by 
using a cast constructor. And casting a generic Document to your 
implementation seems to be a very useful thing to have. Of course it 
makes sense to reuse a generic implementation like the one in the 
DOMUtilities class, but why don't you just do something like this:

   public SVGOMDocument( Document genericDoc )
   {
       // initialize here
       _documentElement = importNode( genericDoc.getDocumentElement(), 
true );
   }

The importNode(..) method can be implemented in a very high class, thus 
allowing reuse of the code. Using Factory Methods for parts of it would 
be very useful, since it allows refinement in derived classes when 
needed. But I'd guess >95% of your users don't need it and don't want to 
care about your abstraction layers. Don't force them to use them for 
every simple thing they want to do. This is one of the worst habits in 
the Java world (another essay I want to write: "Don't overuse Design 
Patterns" -- the GoF book is one of my favoured recommendations but many 
people misuse them).

Using this approach I could get your implementation wherever I want by 
just calling

   new SVGOMDocument( myW3CDoc )

which is IMHO much straighter than using additional classes.

Of course this approach often leads to one of Javas other serious 
issues: no real multiple inheritance. You can't do this approach in an 
interface (no implementations allowed) but you often would like to 
combine more than one of these generic code approaches to one 
implementation. (Probably one of the reasons why the Java people started 
using Abstract Factory everywhere.) In your case having a simple (i.e. 
tree) hierarchy should be ok.

I hope there was enough real information in this to help you and be 
assured that I would talk in the same way about my code from 2, maybe 
even 1 year(s) ago (sometimes maybe less) -- and I guess in 2 years it 
will be the same with the code I write now. "The only good code you 
wrote is the code you wrote today." I think you did some mistakes in 
your design but I also think this is perfectly normal and that constant 
refactoring should be an important aspect of every software development 
process. It is always easier to see problems once they occured ;-)

HTH,
   PeterB


PS: if you should think I am too idealistic here and that the things I 
describe are not "real world" applicable: I wrote Qt programs for more 
than 2 years and I learned to dislike C++ for various reasons, but I 
still love Qt. Sometimes there are small issues but the general design 
is great and unlike Sun they have really good support and listen to you 
if you complain about design issues. <disclaimer>I am not affiliated 
with Trolltech</disclaimer> ;-)


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: DOM API design (was: Problems with DOM parts)

Reply via email to