Re: FOP Memory issues (fwd from fop-users)

2007-01-15 Thread Andreas L Delmelle

On Jan 14, 2007, at 22:51, J.Pietschmann wrote:


Andreas L Delmelle wrote:
Based on Jörgs statistics, I'd say that the number of children  
will most likely never reach the level where using direct index- 
based access (ArrayList) has its benefits over traversing a tree  
of references (LinkedList).


There may be FOs, specifically fo:flow and fo:table-body, which may
have hundreds of children in real documents.

If the FOs use access functions for the children, even for adding,
each FO can implement a mechanism which suites its purposes best. In
particular, page masters and tables can store the regions in typed
fields, FOs which don't have children can get rid of the field
completely.


This happens already for some FOs. Tables for instance, override  
addChildNode() and store only bodies as their 'child nodes'. The  
header and footer have separate typed fields, the columns end up in a  
separate standard List. Behavior for markers is another example, they  
too end up in a separate collection (map).


I've already been experimenting locally with removing the List  
member, and replacing it with pointers to the neighbors. As I re- 
implemented FObj.addChildNode(), it occurred to me that FOs with a  
possibly larger amount of children would need to traverse the entire  
list to append child nodes to the end. For a handful of children,  
this costs virtually nothing, but as the list gets bigger... Could  
easily be solved though for the applicable FOs, by keeping a  
reference to the last added child, and giving it an addSibling() method.




I'm not sure whether the layout engine places any restrictions on the
access to FO children. Is it possible that access is not random?


FTM, there are only a handful of places in the code where the  
childNodes list is accessed using an index.
The only way it can be accessed by classes outside the fo package is  
through FObj.getChildNodes(), and that method returns a ListIterator,  
not the list itself. The LMs only need access to these iterators  
while creating their childLMs, which is done in forward sequential  
order, i.e.


for (Iterator i = AbstractLM.fobjIter; i.hasNext();) { ... }

[Note: these are ArrayLists we're talking about. Iterators are far  
from the best approach with lists implementing RandomAccess. Not that  
this matters much if the lists are small...]


This fobjIter is created in the constructor of the LM. IIC, this  
means that there is currently no way to, for example, instantiate the  
FlowLM as soon as the first endBlock() event occurs, and keep  
extending the flow's list of child nodes in a different thread. The  
FlowLM's fobjIter will then throw a ConcurrentModificationException().


In the structure I'm currently playing with, it would be possible to  
construct an iterator over an FObj whose child nodes are not all  
known, move to the last node in the list, put the iterator in a stack  
(or keep a reference as an instance member), continue adding nodes in  
another thread, return, and continue iterating (a call to next() will  
then return the first node that was added after the iterator was  
paused).


That would be mighty dangerous if the access to the nodes was not so  
predictable, but as it stands now, this could turn out to be a  
benefit, if documented properly.
Removal from and addition to the tail of the list are never a  
problem. The iterator will simply keep following the following- 
sibling reference until it is null.
The head of the list might become a problem if one needs a decent  
implementation of nextIndex() and previousIndex(). If we never really  
need the index, then the fact that a child was added at a previous  
position will simply make the iterator go back a few steps further  
than where it started.




Cheers,

Andreas





Re: FOP Memory issues (fwd from fop-users)

2007-01-14 Thread J.Pietschmann

Andreas L Delmelle wrote:
Based on Jörgs statistics, I'd say that the number of children will most 
likely never reach the level where using direct index-based access 
(ArrayList) has its benefits over traversing a tree of references 
(LinkedList).


There may be FOs, specifically fo:flow and fo:table-body, which may
have hundreds of children in real documents.

If the FOs use access functions for the children, even for adding,
each FO can implement a mechanism which suites its purposes best. In
particular, page masters and tables can store the regions in typed
fields, FOs which don't have children can get rid of the field
completely.

I'm not sure whether the layout engine places any restrictions on the
access to FO children. Is it possible that access is not random?

J.Pietschmann


Re: FOP Memory issues (fwd from fop-users)

2007-01-13 Thread Andreas L Delmelle

On Jan 12, 2007, at 13:00, [EMAIL PROTECTED] wrote:


snip /
   Just a quick comment on this, given the above stats you might
consider having a single Object field and switch the type of object
from the actual child (when there is a single child) to a List of
children when there are multiple children:


Also an interesting suggestion, thanks!

I'm still wondering whether the use of separate lists could not be  
avoided altogether.
Based on Jörgs statistics, I'd say that the number of children will  
most likely never reach the level where using direct index-based  
access (ArrayList) has its benefits over traversing a tree of  
references (LinkedList).


On top of that, all we really seem to need further in the code, is an  
iterator over that list, not the list itself...


I'm thinking, roughly all we need is something like:
...
class Node {
  Object parent;
  Object firstChild;
  Object[] siblings; //if there are siblings, always length=2
...
class ChildIterator
  Node currentNode;

  ChildIterator(Node n) {
currentNode = n.firstChild;
  }

  hasNext() {
return (currentNode.siblings != null)
  (currentNode.siblings[1] != null);
  }

  next() {
if (hasNext()) {
  return currentNode.siblings[1];
} else {
  throw new NoSuchElementException();
}
  }

etc.

The backing list would be defined by the pointers between the  
objects, and not exist as a separate object (list) itself.


I'm still not completely sure about my estimation, but in the picture  
painted earlier (instance count of ArrayList and Object[]), those  
extra reference(s) for the siblings could turn out to be well worth  
it, since they'd only slightly increase the instance size of already  
existing objects, but they do avoid the creation of so many new  
ArrayList instances.



Cheers,

Andreas

Re: FOP Memory issues (fwd from fop-users)

2007-01-12 Thread thomas . deweese
Hi all,

Andreas L Delmelle [EMAIL PROTECTED] wrote on 01/11/2007 05:24:36 
PM:

 On Jan 11, 2007, at 22:31, J.Pietschmann wrote:
 
  Quite some time ago I did some statistics on number of children
  of FOs, using the FOP examples and FO files from bug reports.
  The breakdown was roughly the following
   ~50% no children, mostly FOText nodes and FOs like region-body
  and page-number-citation
   ~40% one child, mostly blocks and inlines (fo:wrapper) having
  exactly one FOText node as child
   10% 2..10 children

   Just a quick comment on this, given the above stats you might
consider having a single Object field and switch the type of object
from the actual child (when there is a single child) to a List of
children when there are multiple children:

   Object child;

   Child getChild(int idx) {
if (child instanceof List) return (Child)((List)child).get(idx);
  if (idx != 0) throw new IndexOutOfBoundsException(...);
  return (Child)child;
   }

   int getNumChildren() {
if (child instanceof List) return ((List)child).size();
  return 1;
   }

   I do this in Batik for the ID HashTable which has similar stats
(you might say that the hashtable can only hold one entry but when
dealing with things like DocumentFragments and detached DOM tree's
it get's a little fuzzy what the ID rules really are).

   Admittedly the code is a little ugly, but you can hide
it from all but the implementing class.
Just something to think about it might give you a significant 
savings (might even be a tad faster...).

Re: FOP Memory issues (fwd from fop-users)

2007-01-11 Thread richardw
Andreas L Delmelle writes:
  I'd say the 80K ArrayLists are simply the childNodes lists of all  
  those FObj (TableCells and Blocks), and that those are, in most  
  cases, lists of only one element.

This is correct - for most instances,

Richard



Re: FOP Memory issues (fwd from fop-users)

2007-01-11 Thread Andreas L Delmelle

On Jan 11, 2007, at 10:02, [EMAIL PROTECTED] wrote:


Andreas L Delmelle writes:

I'd say the 80K ArrayLists are simply the childNodes lists of all
those FObj (TableCells and Blocks), and that those are, in most
cases, lists of only one element.


This is correct - for most instances,


Which brings us to another important piece of information that would  
be interesting to know: how big are each of those instances?


Note that our ArrayLists are all created using the default  
constructor (here and there, you'll find a few where an  
initialCapacity is supplied). If the majority of those lists contain  
only one element, there will still be backing Object arrays of 10  
elements (9 out of 10 references remain null).
If we'd implement our own FONodeList() and supply it with a compact()  
method to shrink the backing array to the size that is needed to  
store all childNodes, and make sure this method is called upon in  
endOfNode() when all children are known. This might again save a few...
Rough calculation, supposing 4 Bytes needed for each empty reference:  
(80K x (4 x 9)) Bytes ~ 2.8 MB wasted on nulls, IIC.



Cheers,

Andreas


Re: FOP Memory issues (fwd from fop-users)

2007-01-11 Thread J.Pietschmann

Andreas L Delmelle wrote:
Which brings us to another important piece of information that would be 
interesting to know: how big are each of those instances?


Quite some time ago I did some statistics on number of children
of FOs, using the FOP examples and FO files from bug reports.
The breakdown was roughly the following
 ~50% no children, mostly FOText nodes and FOs like region-body
and page-number-citation
 ~40% one child, mostly blocks and inlines (fo:wrapper) having
exactly one FOText node as child
 10% 2..10 children
 1% more than 10 children, mostly fo:flow, table and table-body
and a few blocks, usually wrapping other blocks.

Real world documents with more tables and inline formatting might
have more multi-child FOs.

I haven't checked whether FOText still inherits the children field
on trunk. If so, it is certainly a good idea to get rid of this
(in the maintenance branch, this had widespread implications).
The case of exactly one child might be worth optimizing too.

Two possible solutions:
A) all FO node implement a FOContainer interface, for example
 FONode childAt(int)
 int numberOfChildren()
where FOText for example would hardcode return values of null and 0.
B) Use a FOChildrenIterator interface with specific implementations
for FO nodes which can have none or exactly one child.

Furthermore, in the maintenance branch most of the more specific
FOs copied children from the generic children list into properly
typed fields before starting layout, in many cases the generic
children list could have been deleted afterwards if this wouldn't
have broken a few generic recursive algorithms like the one adjusting
available space due to footnotes. The discussion then had Keiron said
he'd even get rid of the generic children list in favour for properly
typed fields, thereby giving up some flexibility needed for
extensions. This could be implemented with either of the approaches
above too (FOContainer interface or FOChildrenIterator interface).

J.Pietschmann


Re: FOP Memory issues (fwd from fop-users)

2007-01-11 Thread Andreas L Delmelle

On Jan 11, 2007, at 22:31, J.Pietschmann wrote:


Quite some time ago I did some statistics on number of children
of FOs, using the FOP examples and FO files from bug reports.
The breakdown was roughly the following
 ~50% no children, mostly FOText nodes and FOs like region-body
and page-number-citation
 ~40% one child, mostly blocks and inlines (fo:wrapper) having
exactly one FOText node as child
 10% 2..10 children
 1% more than 10 children, mostly fo:flow, table and table-body
and a few blocks, usually wrapping other blocks.

Real world documents with more tables and inline formatting might
have more multi-child FOs.


Interesting figures...



I haven't checked whether FOText still inherits the children field
on trunk. If so, it is certainly a good idea to get rid of this
(in the maintenance branch, this had widespread implications).
The case of exactly one child might be worth optimizing too.


This was indeed altered, I don't know when, and by whom precisely  
(Glen or Finn, IIRC).

Anyways, the hierarchy is currently:

FONode
|-FOText
|-FObj

and only FObj has a protected childNodes instance member, which is a  
generic ArrayList (and as I hinted, they are all created with: new  
java.util.ArrayList(), which defaults to an initial backing Object[10]).


A FONode only holds the reference to the parent in the tree, and the  
FObj.childNodes list is only created when FObj.addChildNode() is  
called, so if there is no child element, this reference will always  
be null.



Two possible solutions:
A) all FO node implement a FOContainer interface, for example
 FONode childAt(int)
 int numberOfChildren()
where FOText for example would hardcode return values of null and 0.
B) Use a FOChildrenIterator interface with specific implementations
for FO nodes which can have none or exactly one child.


I was already thinking along the lines of creating a subclass of  
Vector, or an implementation of List, but I'm beginning to wonder if  
it wouldn't be worth it to create a link between the children...?  
Instead of holding a reference to an ArrayList, each FObj would have  
three references: parent, firstChild, nextSibling. Add that  
FOContainer interface or a FONodeIterator to navigate through it...  
could work out. 8)


The benefit would be that the average size of an FObj becomes much  
more transparent: three references, the properties and a handful of  
private helpers.



Furthermore, in the maintenance branch most of the more specific
FOs copied children from the generic children list into properly
typed fields before starting layout, in many cases the generic
children list could have been deleted afterwards if this wouldn't
have broken a few generic recursive algorithms like the one adjusting
available space due to footnotes. The discussion then had Keiron said
he'd even get rid of the generic children list in favour for properly
typed fields, thereby giving up some flexibility needed for
extensions.


In the trunk, extensions are treated very differently from what I can  
tell. Mainly thanks to Jeremias, IIC, who made extensive changes when  
implementing XMP metadata. They are stored in a separate collection  
(ExtensionAttachments), which adds its own flexibility, but also  
difficulty. I'm still not sure how one would have to write an  
arbitrary fo extension that is supposed to influence the flow of the  
layout algorithm without needing access to the deeper layout API.  
Maybe we'd need a sort of generic ExtensionLayoutManager, too...?



Andreas


Re: FOP Memory issues (fwd from fop-users)

2007-01-10 Thread richardw
Manuel Mall writes:
  On Wednesday 10 January 2007 03:42, Andreas L Delmelle wrote:
   White-space nodes? Could also be an effect of the white-space
   collapsing. Long shot, but theoretically, 'white-space-handling' in
   FOText means 'replace FOText.ca with a copy minus a few characters',
   if the originals weren't GC'ed at the time of the snapshot...
  
  I assumed the snapshot shows only reachable objects, i.e. objects not 
  eligible for garbage collection. Otherwise it could be seriously 
  misleading.

The whole thing bombs out with an OutOfMemoryError. If it didn't I
wouldn't be spending my time dealing with it :) The contract for
this explicitly says that garbage collection will have called before
then.

Richard



Re: FOP Memory issues (fwd from fop-users)

2007-01-10 Thread Andreas L Delmelle

On Jan 10, 2007, at 20:39, Andreas L Delmelle wrote:


On Jan 10, 2007, at 04:50, Manuel Mall wrote:


On Wednesday 10 January 2007 03:42, Andreas L Delmelle wrote:

snip /
Property lists themselves are no longer alive in the snapshot, it
seems. I don't suppose they are that much of a problem. They are
meant to be in scope for only a very brief interval (best case: from
PropertyListMaker.make() until FObj.bind())

Hmm, what are the 80K+ ArrayLists and their backing Object[] then?



That remains the question, of course...
If it had been the PropertyLists, there would also be 80K instances  
of StaticPropertyList.


BTW: PropertyLists use plain arrays of Property[] (not ArrayLists)



I'd say the 80K ArrayLists are simply the childNodes lists of all  
those FObj (TableCells and Blocks), and that those are, in most  
cases, lists of only one element.


I have read in relation to Java performance tuning that, if you know  
in advance which type of element will be stored in a Collection, it  
almost always pays off to write your own implementation... In this  
case, we could add a FONodeList class, that implements the  
java.util.List interface, and have it backed by a FONode[].


This could at least drastically reduce the number of casts made in  
the fo package.


Hmm, OK, this is such a good idea that I might even get started on it  
right away :)



Cheers,

Andreas



Re: FOP Memory issues (fwd from fop-users)

2007-01-10 Thread Jess Holle
For all the talk of FOP memory issues, does anyone have any information 
as to whether the problem is worse than in FOP 0.20.5 -- and, if so, by 
how much?


I'm considering upgrading what we require/bundle to 0.93 and currently 
have reflection code allowing operation against 0.93 or 0.20.5 -- 
whatever's present.  I don't want to step up to 0.93 and discover 
memory usage for things that used to work in 0.20.5 is many times higher 
or such, though...


--
Jess Holle


Re: FOP Memory issues (fwd from fop-users)

2007-01-09 Thread richardw
Andreas L Delmelle writes:
  If I remember correctly, that was precisely the problem, since  
  Cliff's report consists of one giant table. It's supposed to look  
  like one uninterrupted flow, so figuring out where the page-sequences  
  should end is next to impossible... (or IOW: sorting that out kind of  
  defeats the purpose of using a formatter to compute the page-breaks) :/

That's exactly the same problem I ran up against. Hence my investigations
into properties and memory consumption. On my current setup (as per the
last patch under 41044) and using my current 16MB fo file, the object
counts are as follows:

349576 instances of class org.apache.fop.fo.properties.CondLengthProperty
145647 instances of class org.apache.fop.fo.properties.KeepProperty
126237 instances of class org.xml.sax.helpers.LocatorImpl
116521 instances of class org.apache.fop.fo.properties.SpaceProperty
87814 instances of class java.lang.Object[]
87458 instances of class java.util.ArrayList
87394 instances of class 
org.apache.fop.fo.properties.CommonBorderPaddingBackground
87394 instances of class 
org.apache.fop.fo.properties.CommonBorderPaddingBackground$BorderInfo[]
87394 instances of class org.apache.fop.fo.properties.CondLengthProperty[]
81632 instances of class char[]
48548 instances of class org.apache.fop.fo.properties.LengthRangeProperty
42649 instances of class java.lang.String
38841 instances of class org.apache.fop.fo.properties.CommonMarginBlock
38839 instances of class org.apache.fop.datatypes.LengthBase
38839 instances of class org.apache.fop.fo.properties.PercentLength
38838 instances of class org.apache.fop.fo.flow.Block
38836 instances of class org.apache.fop.fo.flow.TableCell
9710 instances of class org.apache.fop.fo.flow.TableRow
5021 instances of class java.lang.Integer
4329 instances of class java.util.HashMap$Entry
1521 instances of class java.lang.Class
787 instances of class java.awt.Color
667 instances of class java.util.HashMap$Entry[]
658 instances of class java.util.HashMap
330 instances of class java.util.Hashtable$Entry
201 instances of class java.util.WeakHashMap$Entry
181 instances of class org.apache.fop.fo.properties.EnumProperty
160 instances of class java.util.concurrent.ConcurrentHashMap$HashEntry[]
160 instances of class java.util.concurrent.ConcurrentHashMap$Segment
160 instances of class java.util.concurrent.locks.ReentrantLock$NonfairSync
127 instances of class java.lang.String[]
116 instances of class java.util.LinkedHashMap$Entry
110 instances of class byte[]
...

As you can see, there's still a lot of value in getting the compound
properties reusable - which I still intend on doing. This won't make it
possible to handle arbitary documents, but will at least raise the bar
somewhat.

I'm also currently reading through Knuth's Digital Typography. Can anyone
point out any sections I should pay particular attention to w.r.t. FOP's
usage,

Regards,

Richard



Re: FOP Memory issues (fwd from fop-users)

2007-01-09 Thread Manuel Mall
On Tuesday 09 January 2007 18:13, [EMAIL PROTECTED] 
wrote:
 Andreas L Delmelle writes:
   If I remember correctly, that was precisely the problem, since
   Cliff's report consists of one giant table. It's supposed to look
   like one uninterrupted flow, so figuring out where the
   page-sequences should end is next to impossible... (or IOW:
   sorting that out kind of defeats the purpose of using a formatter
   to compute the page-breaks) :/

 That's exactly the same problem I ran up against. Hence my
 investigations into properties and memory consumption. On my current
 setup (as per the last patch under 41044) and using my current 16MB
 fo file, the object counts are as follows:

 349576 instances of class
 org.apache.fop.fo.properties.CondLengthProperty 145647 instances of
 class org.apache.fop.fo.properties.KeepProperty 126237 instances of
 class org.xml.sax.helpers.LocatorImpl
 116521 instances of class org.apache.fop.fo.properties.SpaceProperty
 87814 instances of class java.lang.Object[]
 87458 instances of class java.util.ArrayList
 87394 instances of class
 org.apache.fop.fo.properties.CommonBorderPaddingBackground 87394
 instances of class
 org.apache.fop.fo.properties.CommonBorderPaddingBackground$BorderInfo
[] 87394 instances of class
 org.apache.fop.fo.properties.CondLengthProperty[] 81632 instances of
 class char[]
 48548 instances of class
 org.apache.fop.fo.properties.LengthRangeProperty 42649 instances of
 class java.lang.String
 38841 instances of class
 org.apache.fop.fo.properties.CommonMarginBlock 38839 instances of
 class org.apache.fop.datatypes.LengthBase 38839 instances of class
 org.apache.fop.fo.properties.PercentLength 38838 instances of class
 org.apache.fop.fo.flow.Block
 38836 instances of class org.apache.fop.fo.flow.TableCell
 9710 instances of class org.apache.fop.fo.flow.TableRow
 5021 instances of class java.lang.Integer
 4329 instances of class java.util.HashMap$Entry
 1521 instances of class java.lang.Class
 787 instances of class java.awt.Color
 667 instances of class java.util.HashMap$Entry[]
 658 instances of class java.util.HashMap
 330 instances of class java.util.Hashtable$Entry
 201 instances of class java.util.WeakHashMap$Entry
 181 instances of class org.apache.fop.fo.properties.EnumProperty
 160 instances of class
 java.util.concurrent.ConcurrentHashMap$HashEntry[] 160 instances of
 class java.util.concurrent.ConcurrentHashMap$Segment 160 instances of
 class java.util.concurrent.locks.ReentrantLock$NonfairSync 127
 instances of class java.lang.String[]
 116 instances of class java.util.LinkedHashMap$Entry
 110 instances of class byte[]
 ...


Richard,

very good stuff. I am trying to make sense of the numbers. Let me 
paraphrase the data:

A table with 4 columns and 9710 rows. Each table cell contains a block 
and there is also a block around the table. This gives us (roughly) the 
87394 CommonBorderPaddingBackground and CondLengthProperty[] instances. 
We also have an ArrayList (the property list) per formatting object and 
each ArrayList is backed by an Object[].

What are the 81632 instances of class char[]? I assume this is the text 
in the table cells. But why are there more than twice as many as there 
are table cells?

Your summary also shows 126237 instances of class 
org.xml.sax.helpers.LocatorImpl. I believe the only purpose of these 
helpers if for providing location information in error messages. Looks 
like a fairly expensive feature.

 As you can see, there's still a lot of value in getting the compound
 properties reusable - which I still intend on doing. This won't make
 it possible to handle arbitary documents, but will at least raise the
 bar somewhat.


Based on your above figures reusing identical compound property 
instances would certainly be a very useful improvement. A possible next 
step would be to reuse identical property lists. Especially in 
documents with lots of identically formatted table cells this would 
further reduce the memory footprint.

 I'm also currently reading through Knuth's Digital Typography. Can
 anyone point out any sections I should pay particular attention to
 w.r.t. FOP's usage,

 Regards,

 Richard

Manuel


Re: FOP Memory issues (fwd from fop-users)

2007-01-09 Thread Amin Ahmad

Perhaps the flyweight pattern, described by the GoF, may be of use to
whoever is going to look into an implementation strategy.
http://en.wikipedia.org/wiki/Flyweight_pattern gives a decent example.

amin


On 1/9/07, Manuel Mall [EMAIL PROTECTED] wrote:


On Tuesday 09 January 2007 18:13, [EMAIL PROTECTED]
wrote:
 Andreas L Delmelle writes:
   If I remember correctly, that was precisely the problem, since
   Cliff's report consists of one giant table. It's supposed to look
   like one uninterrupted flow, so figuring out where the
   page-sequences should end is next to impossible... (or IOW:
   sorting that out kind of defeats the purpose of using a formatter
   to compute the page-breaks) :/

 That's exactly the same problem I ran up against. Hence my
 investigations into properties and memory consumption. On my current
 setup (as per the last patch under 41044) and using my current 16MB
 fo file, the object counts are as follows:

 349576 instances of class
 org.apache.fop.fo.properties.CondLengthProperty 145647 instances of
 class org.apache.fop.fo.properties.KeepProperty 126237 instances of
 class org.xml.sax.helpers.LocatorImpl
 116521 instances of class org.apache.fop.fo.properties.SpaceProperty
 87814 instances of class java.lang.Object[]
 87458 instances of class java.util.ArrayList
 87394 instances of class
 org.apache.fop.fo.properties.CommonBorderPaddingBackground 87394
 instances of class
 org.apache.fop.fo.properties.CommonBorderPaddingBackground$BorderInfo
[] 87394 instances of class
 org.apache.fop.fo.properties.CondLengthProperty[] 81632 instances of
 class char[]
 48548 instances of class
 org.apache.fop.fo.properties.LengthRangeProperty 42649 instances of
 class java.lang.String
 38841 instances of class
 org.apache.fop.fo.properties.CommonMarginBlock 38839 instances of
 class org.apache.fop.datatypes.LengthBase 38839 instances of class
 org.apache.fop.fo.properties.PercentLength 38838 instances of class
 org.apache.fop.fo.flow.Block
 38836 instances of class org.apache.fop.fo.flow.TableCell
 9710 instances of class org.apache.fop.fo.flow.TableRow
 5021 instances of class java.lang.Integer
 4329 instances of class java.util.HashMap$Entry
 1521 instances of class java.lang.Class
 787 instances of class java.awt.Color
 667 instances of class java.util.HashMap$Entry[]
 658 instances of class java.util.HashMap
 330 instances of class java.util.Hashtable$Entry
 201 instances of class java.util.WeakHashMap$Entry
 181 instances of class org.apache.fop.fo.properties.EnumProperty
 160 instances of class
 java.util.concurrent.ConcurrentHashMap$HashEntry[] 160 instances of
 class java.util.concurrent.ConcurrentHashMap$Segment 160 instances of
 class java.util.concurrent.locks.ReentrantLock$NonfairSync 127
 instances of class java.lang.String[]
 116 instances of class java.util.LinkedHashMap$Entry
 110 instances of class byte[]
 ...


Richard,

very good stuff. I am trying to make sense of the numbers. Let me
paraphrase the data:

A table with 4 columns and 9710 rows. Each table cell contains a block
and there is also a block around the table. This gives us (roughly) the
87394 CommonBorderPaddingBackground and CondLengthProperty[] instances.
We also have an ArrayList (the property list) per formatting object and
each ArrayList is backed by an Object[].

What are the 81632 instances of class char[]? I assume this is the text
in the table cells. But why are there more than twice as many as there
are table cells?

Your summary also shows 126237 instances of class
org.xml.sax.helpers.LocatorImpl. I believe the only purpose of these
helpers if for providing location information in error messages. Looks
like a fairly expensive feature.

 As you can see, there's still a lot of value in getting the compound
 properties reusable - which I still intend on doing. This won't make
 it possible to handle arbitary documents, but will at least raise the
 bar somewhat.


Based on your above figures reusing identical compound property
instances would certainly be a very useful improvement. A possible next
step would be to reuse identical property lists. Especially in
documents with lots of identically formatted table cells this would
further reduce the memory footprint.

 I'm also currently reading through Knuth's Digital Typography. Can
 anyone point out any sections I should pay particular attention to
 w.r.t. FOP's usage,

 Regards,

 Richard

Manuel



Re: FOP Memory issues (fwd from fop-users)

2007-01-09 Thread Vincent Hennebert
Richard a écrit :
snip/
 I'm also currently reading through Knuth's Digital Typography. Can anyone
 point out any sections I should pay particular attention to w.r.t. FOP's
 usage,

(Digital Typography caught my eyes. I'll try to respond to the rest
later.)

Chapter 3, Breaking Paragraphs Into Lines, is definitely THE chapter
to read.
The first 2 chapters are dealing with font rendering, which goes too far
for us --but it may be interesting for your personal culture. I haven't
read the rest but it seems very TeX-oriented to me. Maybe interesting,
but obsoleted by the current font technology (Type1, OpenType) anyway.

I've considered to describe the linebreaking algorithm in a less cryptic
manner (variable names of more than one letter, mainly) on the Wiki [1],
but have never had the time to do it. If you ever want to do that, that
would be very valuable I think!

[1] http://wiki.apache.org/xmlgraphics-fop/KnuthsModel

Vincent


Re: FOP Memory issues (fwd from fop-users)

2007-01-09 Thread Peter B. West
On Tue, 2007-01-09 at 16:52 +0100, Vincent Hennebert wrote:
 Richard a écrit :
 snip/
  I'm also currently reading through Knuth's Digital Typography. Can anyone
  point out any sections I should pay particular attention to w.r.t. FOP's
  usage,
 
 (Digital Typography caught my eyes. I'll try to respond to the rest
 later.)
 
 Chapter 3, Breaking Paragraphs Into Lines, is definitely THE chapter
 to read.
 The first 2 chapters are dealing with font rendering, which goes too far
 for us --but it may be interesting for your personal culture. I haven't
 read the rest but it seems very TeX-oriented to me. Maybe interesting,
 but obsoleted by the current font technology (Type1, OpenType) anyway.
 
 I've considered to describe the linebreaking algorithm in a less cryptic
 manner (variable names of more than one letter, mainly) on the Wiki [1],
 but have never had the time to do it. If you ever want to do that, that
 would be very valuable I think!
 
 [1] http://wiki.apache.org/xmlgraphics-fop/KnuthsModel
 
 Vincent

And I thought about describing Knuth and Plass' algorithm in a less
cryptic manner, and from a slightly different point of view.
http://defoe.sourceforge.net/folio/knuth-plass.html

N.I.H.

Peter





Re: FOP Memory issues (fwd from fop-users)

2007-01-09 Thread Andreas L Delmelle

On Jan 9, 2007, at 14:11, Manuel Mall wrote:


snip /
What are the 81632 instances of class char[]? I assume this is the  
text

in the table cells. But why are there more than twice as many as there
are table cells?


Hehe, see my little remark about the TextLM... In its initialize()  
method (I think, will check later), the FOText's char array is copied  
(System.arraycopy()).


That means there are currently two nearly identical char arrays alive  
for each text node in the page sequence :(


Cheers,

Andreas


Re: FOP Memory issues (fwd from fop-users)

2007-01-09 Thread Andreas L Delmelle

On Jan 9, 2007, at 18:46, Andreas L Delmelle wrote:


On Jan 9, 2007, at 14:11, Manuel Mall wrote:


snip /
What are the 81632 instances of class char[]? I assume this is the  
text
in the table cells. But why are there more than twice as many as  
there

are table cells?


Hehe, see my little remark about the TextLM... In its initialize()  
method (I think, will check later), the FOText's char array is  
copied (System.arraycopy()).


Sorry, my bad. Just realized that Richard's data indicates that the  
layout stage hasn't even been reached at that point...? But that also  
makes the picture somewhat more dramatic, because




That means there are currently two nearly identical char arrays  
alive for each text node in the page sequence :(


this is till true, nonetheless, and that means that at layout stage  
there will be twice as many (16+ instances)


White-space nodes? Could also be an effect of the white-space  
collapsing. Long shot, but theoretically, 'white-space-handling' in  
FOText means 'replace FOText.ca with a copy minus a few characters',  
if the originals weren't GC'ed at the time of the snapshot...



A possible next
step would be to reuse identical property lists. Especially in
documents with lots of identically formatted table cells this would
further reduce the memory footprint.


Property lists themselves are no longer alive in the snapshot, it  
seems. I don't suppose they are that much of a problem. They are  
meant to be in scope for only a very brief interval (best case: from  
PropertyListMaker.make() until FObj.bind())
Except for the child-parent relation, there are virtually no strong  
references to PropertyLists. One in the MainFOHandler(), to pass down  
as a parentPropertyList to the others.

Notable exceptions are
- a table-column's PropertyList: could be needed to resolve calls to  
from-table-column()
- a retrieve-marker's PropertyList: is needed for deferred resolution  
of the marker properties


Possible improvement would be a subtype that collapses the tree of  
PropertyLists at a given point. Right now, every PropertyList holds a  
reference to its parent, and with that, to the whole ancestry...



Cheers,

Andreas



Re: FOP Memory issues (fwd from fop-users)

2007-01-08 Thread Andreas L Delmelle

On Jan 5, 2007, at 16:20, Jeremias Maerki wrote:

Adding page breaks will not be enough, BTW. But you already noticed  
that.
FOP can currently only release memory at the end of a page- 
sequence. So
instead of creating page-breaks, try to restart a new page- 
sequence. The

memory usage should drop considerably.


If I remember correctly, that was precisely the problem, since  
Cliff's report consists of one giant table. It's supposed to look  
like one uninterrupted flow, so figuring out where the page-sequences  
should end is next to impossible... (or IOW: sorting that out kind of  
defeats the purpose of using a formatter to compute the page-breaks) :/



There's also a little class (CachedRenderPagesModel) which could
theoretically be used instead of the default RenderPagesModel. It  
allows
to temporarily off-load rendered pages to disk if they can't be  
rendered

right away. But this is not actively tested and does not help with the
memory consumption of the FO tree which probably is representing the
largest part in your case.


The one way I see that FOP is ever going to get close to resolving  
the issue of arbitrarily sized page-sequences, is if the overall  
processing is 'slightly' modified (quoted, since it seems like only a  
small change, but it would still be quite some work for one man).


The redesign was ultimately meant to modularize FOP. Now the fo-tree  
and the layoutengine have been successfully extracted into separate  
modules, seems like it's time to revisit the way they work together.  
Currently, we have two monolithic modules performing their respective  
operations in sequential order. One module (layout) can't start until  
the other (fo-tree) has reached a critical boundary  
(FOEventHandler.endPageSequence()), and vice versa, the fo-tree can't  
continue until layout for a page-sequence has finished.


Very briefly put: the key would be to implement  
AreaTreeHandler.endBlock().
Use that event to start/resume the layout-loop (ideally this loop  
should run in a separate thread, so there would be real performance- 
boosts on MP-systems), and use endPageSequence() instead only to  
perform one finishing pass over the whole sequence.


Such a change could bring us closer to enhancing FOP in other areas  
as well.
Multiple endBlock() events each offer an opportunity for the  
PageSequenceLM to record available IPD changes, take into account  
footnotes/floats associated with a block etc.


Rough sketch:
At the very first endBlock() the parent FlowLM and PageSequenceLM are  
instantiated, and the first block-sequence is created. The breaker is  
run a first time, storing the resulting active nodes.
Every next occurrence of the event, the ancestor LMs and a set of  
active nodes are already present, a sequence for the current block is  
added, and the breaker is run again...
As such, the page-breaking algorithm would run incrementally,  
performing multiple passes over the same block-sequences.


As you can see from the simplistic sketch, I'm still a bit unsure  
about the specifics, but if all goes well, in the most  
straightforward cases, some LMs can begin adding their areas long  
before the physical end-of-page-sequence is reached. If that also  
implies they can release the reference to their FO (and instruct the  
FOTree to release the reference as well via FONode.removeChild()),  
large parts of the FOTree can be garbage-collected much sooner than  
they are now.
Think of the content of block-containers, non-marker parts of the  
static-content, table-headers/-footers. Even large text-blocks: note  
that the TextLM currently creates a copy of the corresponding  
FOText's char array, while the original happily occupies the same  
amount of memory.


The overall changes would be far from trivial though, AFAICT, but I'd  
love to see some more brainstorming in this direction. Biggest  
problem, IIC, is that AbstractBreaker.doLayout() currently performs  
everything in one go.




Cheers,

Andreas