Re: just a thought

J.Pietschmann Tue, 13 Aug 2002 13:12:36 -0700

"Arved Sandstrom" wrote:
 > I overlooked the "PCDATA as child" case...taking that into account there is
 > no doubt that 1 child is an important case. But I am still not convinced
 > this case needs treatment different from the "2 or more children" case as
 > Oleg proposed.

I also considered what Oleg proposed before. I thought this would
require to have a "java.lang.Object children" and test whether
this is null, a FONode or a Vector (ArrayList) and an
appropriate cast.  Unfortunately, in the maintenance branch
the children vector is protected and directly accessed in a
lot of places, putting the test+cast code and the handling
of the one-child-only special case everywhere seemed to be
too much work. Writing a custom iterator for the children
list would solve this, but I *hate* writing custom iterators.
(It is harder than many people think to get it both correct
and efficient)

 > Right, it is both adding and retrieving that needs checking. However, in the
 > adding case it is the callee that is responsible for checking; in the other
 > case it's the caller that needs to look at what it got back.

The problem is to find all the places. This is not helped by
Area having a children field too... I think the best method is
to set FONode's children field to private and compile.

An interesting optimisation could be to omit the initialisation
of variables in case they are only used in the layout of children.

 > ... Using the DOM as an analogy, where we
 > have either the Node-based view or the typed view (Elements, Attributes,
 > Text, etc), the FO operations in Fop could be all quite generic (addChild(),
 > getChildren(), hasChildren(), etc) or more targeted (addText(), addInline(),
 > addBlock(), addSimplePageMaster() etc), or hybrid, as suggested above.
 >
 > I am partial to the use of typed children, including marker children. I am
 > not a big fan of the generic approach, not any more (if I ever was). I don't
 > think a typed child approach would interfere with extensibility, and I think
 > the primary advantage is that the code is more self-documenting. Accessor
 > methods could also be more specialized. Also, the sophisticated
 > content-model checking that one needs to do with XSL is easier done, the
 > sooner you move to explicit knowledge of what you are adding to what.

Indeed. As I said, I already started to move to a more typed approach.
A more generic approach was certainly fine as long as the standard was
still in flux, but now it looks ugly. In the current FO classes, there
are far too many
    if (fo.getName().equals("foo:stuff")) {
       Stuff stuff = (Stuff)fo;
       doSomethingSpecial();
    }
or the equivalent "fo instanceof Stuff". According to all OO books
and my personal feeling, every "instanceof" or "getName().equals()"
is suspect as well as every type cast except casting container get()s
(which doesn't mean to get rid of the suspect stuff at all cost).
I added addSimplePageMaster(), addTableBody() and so on to FONode with
a default implementation throwing a FOException with a
getName()+" can't have <stuff>" message and for example LayoutMasterSet
implementing the real adding. This also (hopefully) allows an easier
and proper check for (most of) the constraints imposed by the FO schema
and the rest of the spec.
There are a few stumbling blocks though. One was the current table
implementation, which I couldn't quite understand. After setting
columns, header, body and footer separately, it no longer worked.
Another problem was the, well, impedance mismatch between the content
model of the spec and the current implementation. Currently, both Block
and Inline inherit code from FObjMixed. However, there are places where
only block FOs (Block, Table,...) or certain inline FOs are allowed.
This produces a hard choice between conveniently reusing code and having
properly typed interfaces. After thinking hard about having "AbstractBlock"
and "AbstractInline" interfaces or using classes which delegate the common
processing to a FObjMixed object (which is no longer a FObj then), I
finally concluded to just stick with addChildren(FObj) for most of the
flow FOs and let someone else come up with a cleaner solution.
This does not cope with the error message produced by one of the examples
in the advanced directory:
   <fo:flow>
     <fo:wrapper ...>
       <fo:block>
This produces a "text outside block area" because of the whitespace
before the fo:block start. Markers s*ck quite mightily too.

As a more humorous side note, the content model for fo:footnote is
  (fo:inline,fo:footnote-body).
and because in FOP BasicLine is a subclass of Inline, writing
  <fo:footnote><fo:basic-link>...</fo:basic-link><fo:footnote-body>...
goes undetected and probably even works. I'm not sure whether we
actually should implement the spec *that* rigidly.

Now that I already have attention of the audience, again the subject
of property processing:

 > Keiron Liddle wrote:
 > Does that mean we should not attempt to solve this problem?
 > Or that we should attempt to solve the problem twice independantly.

Actually, this problem has been tackled several times already, in
particular if we include Peter's efford, with no really satisfying
solution yet. In part, this is due to some attempts to keep it as
generic as possible, and to some sigificant degree also due to the
hairrising complexity of the matter itself.
I myself take issue with:
- Property handling is hard to understand, with a gadzillion of
   indirections and odd "instanceof" and casts and of course, the
   XSL generated code.
- There are a lot of classes involved, sometimes with seemingly
   duplicated semantics.
- The DirectPropertyList abomination.
- In HEAD, there appears to be another batch of Trait classes which
   also appears to deal with data which the various Property are
   apparently used for.

At first, there should be a clear distinction drawn between XML
attributes and FO properties.
So what's wrong with the following approach:
- Pass the sax.AttributeList to the FO's constructor.
- Have a FONode method which goes through the attribute list and
   +  gets a PropertyMaker from a hash table the same way a FObj.Maker
      is retrieved and store it in a list, except for "font-size" and
      "font" which are processed immediately
   +  walk the list and invoke a "parse" method with the XML attribute
      value, the FO and the parent
- The parser gets the parse context from the FO and parses the
   attribute value in a more or less customised way.
- Once a property is parsed, resolve the value as far as possible and
   tell the FO to store it. Keep a bit whether it was already set for
   conflicts with shorthands which may be evaluated later and for the
   "get-nearest-specified value". The latter information can be
   discarded in the FO's end() method (which is underutilised anyway).
There should be no need to actually store for many properties most
of the data types which can be specified in an XML attribute, for
example font-size can always be resolved to an absolute value. Bad
things are for example alignment-adjust which must still store an
alternative of an enum, an absolute length and a percentage.

There are, naturally, concerns that if every property is stored in the
FO, it becomes large. An idea to solve this is to create bundles of
properties which are likely to change together, as already partially
done in FontState and other objects, check after property resolution
whether a bundle with the same values has already been used elsewhere
and reuse this. Some of these bundles, in particular FontState and border
settings can be passed through the areas to the renderer, which might
even improve or simplify reuse of objects there.
I think I posted already that for the "franklin_2pageseqs" example
more than hundred FontState objects are created, while there are only
three different combinations of values.
The problematic point is to choose the bundles wisely: if they hold
too many or too unrelated properties, there wont be much reuse; if they
are too small, there is no gain because the references to the property
bundles still take memory. I had to abort a few attempts at designing
them because my brain seems to be too small to handle this :-(

I tried to implement this for text decorations (TextState), actually
there can be at most 8 different value combinations (disregarding
"blink"), so I preconstructed these and had the PropManager select and
return the appropriate TextState object. Interestingly, this seemed
to *increase* memory consumption of the test run, despite, of course,
much less TextState objects are constructed. I'm still stumped.

Comments?

J.Pietschmann

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Re: just a thought

Reply via email to