Re: FOP Memory issues (fwd from fop-users)
On Jan 14, 2007, at 22:51, J.Pietschmann wrote: Andreas L Delmelle wrote: Based on Jörgs statistics, I'd say that the number of children will most likely never reach the level where using direct index- based access (ArrayList) has its benefits over traversing a tree of references (LinkedList). There may be FOs, specifically fo:flow and fo:table-body, which may have hundreds of children in real documents. If the FOs use access functions for the children, even for adding, each FO can implement a mechanism which suites its purposes best. In particular, page masters and tables can store the regions in typed fields, FOs which don't have children can get rid of the field completely. This happens already for some FOs. Tables for instance, override addChildNode() and store only bodies as their 'child nodes'. The header and footer have separate typed fields, the columns end up in a separate standard List. Behavior for markers is another example, they too end up in a separate collection (map). I've already been experimenting locally with removing the List member, and replacing it with pointers to the neighbors. As I re- implemented FObj.addChildNode(), it occurred to me that FOs with a possibly larger amount of children would need to traverse the entire list to append child nodes to the end. For a handful of children, this costs virtually nothing, but as the list gets bigger... Could easily be solved though for the applicable FOs, by keeping a reference to the last added child, and giving it an addSibling() method. I'm not sure whether the layout engine places any restrictions on the access to FO children. Is it possible that access is not random? FTM, there are only a handful of places in the code where the childNodes list is accessed using an index. The only way it can be accessed by classes outside the fo package is through FObj.getChildNodes(), and that method returns a ListIterator, not the list itself. The LMs only need access to these iterators while creating their childLMs, which is done in forward sequential order, i.e. for (Iterator i = AbstractLM.fobjIter; i.hasNext();) { ... } [Note: these are ArrayLists we're talking about. Iterators are far from the best approach with lists implementing RandomAccess. Not that this matters much if the lists are small...] This fobjIter is created in the constructor of the LM. IIC, this means that there is currently no way to, for example, instantiate the FlowLM as soon as the first endBlock() event occurs, and keep extending the flow's list of child nodes in a different thread. The FlowLM's fobjIter will then throw a ConcurrentModificationException(). In the structure I'm currently playing with, it would be possible to construct an iterator over an FObj whose child nodes are not all known, move to the last node in the list, put the iterator in a stack (or keep a reference as an instance member), continue adding nodes in another thread, return, and continue iterating (a call to next() will then return the first node that was added after the iterator was paused). That would be mighty dangerous if the access to the nodes was not so predictable, but as it stands now, this could turn out to be a benefit, if documented properly. Removal from and addition to the tail of the list are never a problem. The iterator will simply keep following the following- sibling reference until it is null. The head of the list might become a problem if one needs a decent implementation of nextIndex() and previousIndex(). If we never really need the index, then the fact that a child was added at a previous position will simply make the iterator go back a few steps further than where it started. Cheers, Andreas
Re: FOP Memory issues (fwd from fop-users)
Andreas L Delmelle wrote: Based on Jörgs statistics, I'd say that the number of children will most likely never reach the level where using direct index-based access (ArrayList) has its benefits over traversing a tree of references (LinkedList). There may be FOs, specifically fo:flow and fo:table-body, which may have hundreds of children in real documents. If the FOs use access functions for the children, even for adding, each FO can implement a mechanism which suites its purposes best. In particular, page masters and tables can store the regions in typed fields, FOs which don't have children can get rid of the field completely. I'm not sure whether the layout engine places any restrictions on the access to FO children. Is it possible that access is not random? J.Pietschmann
Re: FOP Memory issues (fwd from fop-users)
On Jan 12, 2007, at 13:00, [EMAIL PROTECTED] wrote: snip / Just a quick comment on this, given the above stats you might consider having a single Object field and switch the type of object from the actual child (when there is a single child) to a List of children when there are multiple children: Also an interesting suggestion, thanks! I'm still wondering whether the use of separate lists could not be avoided altogether. Based on Jörgs statistics, I'd say that the number of children will most likely never reach the level where using direct index-based access (ArrayList) has its benefits over traversing a tree of references (LinkedList). On top of that, all we really seem to need further in the code, is an iterator over that list, not the list itself... I'm thinking, roughly all we need is something like: ... class Node { Object parent; Object firstChild; Object[] siblings; //if there are siblings, always length=2 ... class ChildIterator Node currentNode; ChildIterator(Node n) { currentNode = n.firstChild; } hasNext() { return (currentNode.siblings != null) (currentNode.siblings[1] != null); } next() { if (hasNext()) { return currentNode.siblings[1]; } else { throw new NoSuchElementException(); } } etc. The backing list would be defined by the pointers between the objects, and not exist as a separate object (list) itself. I'm still not completely sure about my estimation, but in the picture painted earlier (instance count of ArrayList and Object[]), those extra reference(s) for the siblings could turn out to be well worth it, since they'd only slightly increase the instance size of already existing objects, but they do avoid the creation of so many new ArrayList instances. Cheers, Andreas
Re: FOP Memory issues (fwd from fop-users)
Hi all, Andreas L Delmelle [EMAIL PROTECTED] wrote on 01/11/2007 05:24:36 PM: On Jan 11, 2007, at 22:31, J.Pietschmann wrote: Quite some time ago I did some statistics on number of children of FOs, using the FOP examples and FO files from bug reports. The breakdown was roughly the following ~50% no children, mostly FOText nodes and FOs like region-body and page-number-citation ~40% one child, mostly blocks and inlines (fo:wrapper) having exactly one FOText node as child 10% 2..10 children Just a quick comment on this, given the above stats you might consider having a single Object field and switch the type of object from the actual child (when there is a single child) to a List of children when there are multiple children: Object child; Child getChild(int idx) { if (child instanceof List) return (Child)((List)child).get(idx); if (idx != 0) throw new IndexOutOfBoundsException(...); return (Child)child; } int getNumChildren() { if (child instanceof List) return ((List)child).size(); return 1; } I do this in Batik for the ID HashTable which has similar stats (you might say that the hashtable can only hold one entry but when dealing with things like DocumentFragments and detached DOM tree's it get's a little fuzzy what the ID rules really are). Admittedly the code is a little ugly, but you can hide it from all but the implementing class. Just something to think about it might give you a significant savings (might even be a tad faster...).
Re: FOP Memory issues (fwd from fop-users)
Andreas L Delmelle writes: I'd say the 80K ArrayLists are simply the childNodes lists of all those FObj (TableCells and Blocks), and that those are, in most cases, lists of only one element. This is correct - for most instances, Richard
Re: FOP Memory issues (fwd from fop-users)
On Jan 11, 2007, at 10:02, [EMAIL PROTECTED] wrote: Andreas L Delmelle writes: I'd say the 80K ArrayLists are simply the childNodes lists of all those FObj (TableCells and Blocks), and that those are, in most cases, lists of only one element. This is correct - for most instances, Which brings us to another important piece of information that would be interesting to know: how big are each of those instances? Note that our ArrayLists are all created using the default constructor (here and there, you'll find a few where an initialCapacity is supplied). If the majority of those lists contain only one element, there will still be backing Object arrays of 10 elements (9 out of 10 references remain null). If we'd implement our own FONodeList() and supply it with a compact() method to shrink the backing array to the size that is needed to store all childNodes, and make sure this method is called upon in endOfNode() when all children are known. This might again save a few... Rough calculation, supposing 4 Bytes needed for each empty reference: (80K x (4 x 9)) Bytes ~ 2.8 MB wasted on nulls, IIC. Cheers, Andreas
Re: FOP Memory issues (fwd from fop-users)
Andreas L Delmelle wrote: Which brings us to another important piece of information that would be interesting to know: how big are each of those instances? Quite some time ago I did some statistics on number of children of FOs, using the FOP examples and FO files from bug reports. The breakdown was roughly the following ~50% no children, mostly FOText nodes and FOs like region-body and page-number-citation ~40% one child, mostly blocks and inlines (fo:wrapper) having exactly one FOText node as child 10% 2..10 children 1% more than 10 children, mostly fo:flow, table and table-body and a few blocks, usually wrapping other blocks. Real world documents with more tables and inline formatting might have more multi-child FOs. I haven't checked whether FOText still inherits the children field on trunk. If so, it is certainly a good idea to get rid of this (in the maintenance branch, this had widespread implications). The case of exactly one child might be worth optimizing too. Two possible solutions: A) all FO node implement a FOContainer interface, for example FONode childAt(int) int numberOfChildren() where FOText for example would hardcode return values of null and 0. B) Use a FOChildrenIterator interface with specific implementations for FO nodes which can have none or exactly one child. Furthermore, in the maintenance branch most of the more specific FOs copied children from the generic children list into properly typed fields before starting layout, in many cases the generic children list could have been deleted afterwards if this wouldn't have broken a few generic recursive algorithms like the one adjusting available space due to footnotes. The discussion then had Keiron said he'd even get rid of the generic children list in favour for properly typed fields, thereby giving up some flexibility needed for extensions. This could be implemented with either of the approaches above too (FOContainer interface or FOChildrenIterator interface). J.Pietschmann
Re: FOP Memory issues (fwd from fop-users)
On Jan 11, 2007, at 22:31, J.Pietschmann wrote: Quite some time ago I did some statistics on number of children of FOs, using the FOP examples and FO files from bug reports. The breakdown was roughly the following ~50% no children, mostly FOText nodes and FOs like region-body and page-number-citation ~40% one child, mostly blocks and inlines (fo:wrapper) having exactly one FOText node as child 10% 2..10 children 1% more than 10 children, mostly fo:flow, table and table-body and a few blocks, usually wrapping other blocks. Real world documents with more tables and inline formatting might have more multi-child FOs. Interesting figures... I haven't checked whether FOText still inherits the children field on trunk. If so, it is certainly a good idea to get rid of this (in the maintenance branch, this had widespread implications). The case of exactly one child might be worth optimizing too. This was indeed altered, I don't know when, and by whom precisely (Glen or Finn, IIRC). Anyways, the hierarchy is currently: FONode |-FOText |-FObj and only FObj has a protected childNodes instance member, which is a generic ArrayList (and as I hinted, they are all created with: new java.util.ArrayList(), which defaults to an initial backing Object[10]). A FONode only holds the reference to the parent in the tree, and the FObj.childNodes list is only created when FObj.addChildNode() is called, so if there is no child element, this reference will always be null. Two possible solutions: A) all FO node implement a FOContainer interface, for example FONode childAt(int) int numberOfChildren() where FOText for example would hardcode return values of null and 0. B) Use a FOChildrenIterator interface with specific implementations for FO nodes which can have none or exactly one child. I was already thinking along the lines of creating a subclass of Vector, or an implementation of List, but I'm beginning to wonder if it wouldn't be worth it to create a link between the children...? Instead of holding a reference to an ArrayList, each FObj would have three references: parent, firstChild, nextSibling. Add that FOContainer interface or a FONodeIterator to navigate through it... could work out. 8) The benefit would be that the average size of an FObj becomes much more transparent: three references, the properties and a handful of private helpers. Furthermore, in the maintenance branch most of the more specific FOs copied children from the generic children list into properly typed fields before starting layout, in many cases the generic children list could have been deleted afterwards if this wouldn't have broken a few generic recursive algorithms like the one adjusting available space due to footnotes. The discussion then had Keiron said he'd even get rid of the generic children list in favour for properly typed fields, thereby giving up some flexibility needed for extensions. In the trunk, extensions are treated very differently from what I can tell. Mainly thanks to Jeremias, IIC, who made extensive changes when implementing XMP metadata. They are stored in a separate collection (ExtensionAttachments), which adds its own flexibility, but also difficulty. I'm still not sure how one would have to write an arbitrary fo extension that is supposed to influence the flow of the layout algorithm without needing access to the deeper layout API. Maybe we'd need a sort of generic ExtensionLayoutManager, too...? Andreas
Re: FOP Memory issues (fwd from fop-users)
Manuel Mall writes: On Wednesday 10 January 2007 03:42, Andreas L Delmelle wrote: White-space nodes? Could also be an effect of the white-space collapsing. Long shot, but theoretically, 'white-space-handling' in FOText means 'replace FOText.ca with a copy minus a few characters', if the originals weren't GC'ed at the time of the snapshot... I assumed the snapshot shows only reachable objects, i.e. objects not eligible for garbage collection. Otherwise it could be seriously misleading. The whole thing bombs out with an OutOfMemoryError. If it didn't I wouldn't be spending my time dealing with it :) The contract for this explicitly says that garbage collection will have called before then. Richard
Re: FOP Memory issues (fwd from fop-users)
On Jan 10, 2007, at 20:39, Andreas L Delmelle wrote: On Jan 10, 2007, at 04:50, Manuel Mall wrote: On Wednesday 10 January 2007 03:42, Andreas L Delmelle wrote: snip / Property lists themselves are no longer alive in the snapshot, it seems. I don't suppose they are that much of a problem. They are meant to be in scope for only a very brief interval (best case: from PropertyListMaker.make() until FObj.bind()) Hmm, what are the 80K+ ArrayLists and their backing Object[] then? That remains the question, of course... If it had been the PropertyLists, there would also be 80K instances of StaticPropertyList. BTW: PropertyLists use plain arrays of Property[] (not ArrayLists) I'd say the 80K ArrayLists are simply the childNodes lists of all those FObj (TableCells and Blocks), and that those are, in most cases, lists of only one element. I have read in relation to Java performance tuning that, if you know in advance which type of element will be stored in a Collection, it almost always pays off to write your own implementation... In this case, we could add a FONodeList class, that implements the java.util.List interface, and have it backed by a FONode[]. This could at least drastically reduce the number of casts made in the fo package. Hmm, OK, this is such a good idea that I might even get started on it right away :) Cheers, Andreas
Re: FOP Memory issues (fwd from fop-users)
For all the talk of FOP memory issues, does anyone have any information as to whether the problem is worse than in FOP 0.20.5 -- and, if so, by how much? I'm considering upgrading what we require/bundle to 0.93 and currently have reflection code allowing operation against 0.93 or 0.20.5 -- whatever's present. I don't want to step up to 0.93 and discover memory usage for things that used to work in 0.20.5 is many times higher or such, though... -- Jess Holle
Re: FOP Memory issues (fwd from fop-users)
Andreas L Delmelle writes: If I remember correctly, that was precisely the problem, since Cliff's report consists of one giant table. It's supposed to look like one uninterrupted flow, so figuring out where the page-sequences should end is next to impossible... (or IOW: sorting that out kind of defeats the purpose of using a formatter to compute the page-breaks) :/ That's exactly the same problem I ran up against. Hence my investigations into properties and memory consumption. On my current setup (as per the last patch under 41044) and using my current 16MB fo file, the object counts are as follows: 349576 instances of class org.apache.fop.fo.properties.CondLengthProperty 145647 instances of class org.apache.fop.fo.properties.KeepProperty 126237 instances of class org.xml.sax.helpers.LocatorImpl 116521 instances of class org.apache.fop.fo.properties.SpaceProperty 87814 instances of class java.lang.Object[] 87458 instances of class java.util.ArrayList 87394 instances of class org.apache.fop.fo.properties.CommonBorderPaddingBackground 87394 instances of class org.apache.fop.fo.properties.CommonBorderPaddingBackground$BorderInfo[] 87394 instances of class org.apache.fop.fo.properties.CondLengthProperty[] 81632 instances of class char[] 48548 instances of class org.apache.fop.fo.properties.LengthRangeProperty 42649 instances of class java.lang.String 38841 instances of class org.apache.fop.fo.properties.CommonMarginBlock 38839 instances of class org.apache.fop.datatypes.LengthBase 38839 instances of class org.apache.fop.fo.properties.PercentLength 38838 instances of class org.apache.fop.fo.flow.Block 38836 instances of class org.apache.fop.fo.flow.TableCell 9710 instances of class org.apache.fop.fo.flow.TableRow 5021 instances of class java.lang.Integer 4329 instances of class java.util.HashMap$Entry 1521 instances of class java.lang.Class 787 instances of class java.awt.Color 667 instances of class java.util.HashMap$Entry[] 658 instances of class java.util.HashMap 330 instances of class java.util.Hashtable$Entry 201 instances of class java.util.WeakHashMap$Entry 181 instances of class org.apache.fop.fo.properties.EnumProperty 160 instances of class java.util.concurrent.ConcurrentHashMap$HashEntry[] 160 instances of class java.util.concurrent.ConcurrentHashMap$Segment 160 instances of class java.util.concurrent.locks.ReentrantLock$NonfairSync 127 instances of class java.lang.String[] 116 instances of class java.util.LinkedHashMap$Entry 110 instances of class byte[] ... As you can see, there's still a lot of value in getting the compound properties reusable - which I still intend on doing. This won't make it possible to handle arbitary documents, but will at least raise the bar somewhat. I'm also currently reading through Knuth's Digital Typography. Can anyone point out any sections I should pay particular attention to w.r.t. FOP's usage, Regards, Richard
Re: FOP Memory issues (fwd from fop-users)
On Tuesday 09 January 2007 18:13, [EMAIL PROTECTED] wrote: Andreas L Delmelle writes: If I remember correctly, that was precisely the problem, since Cliff's report consists of one giant table. It's supposed to look like one uninterrupted flow, so figuring out where the page-sequences should end is next to impossible... (or IOW: sorting that out kind of defeats the purpose of using a formatter to compute the page-breaks) :/ That's exactly the same problem I ran up against. Hence my investigations into properties and memory consumption. On my current setup (as per the last patch under 41044) and using my current 16MB fo file, the object counts are as follows: 349576 instances of class org.apache.fop.fo.properties.CondLengthProperty 145647 instances of class org.apache.fop.fo.properties.KeepProperty 126237 instances of class org.xml.sax.helpers.LocatorImpl 116521 instances of class org.apache.fop.fo.properties.SpaceProperty 87814 instances of class java.lang.Object[] 87458 instances of class java.util.ArrayList 87394 instances of class org.apache.fop.fo.properties.CommonBorderPaddingBackground 87394 instances of class org.apache.fop.fo.properties.CommonBorderPaddingBackground$BorderInfo [] 87394 instances of class org.apache.fop.fo.properties.CondLengthProperty[] 81632 instances of class char[] 48548 instances of class org.apache.fop.fo.properties.LengthRangeProperty 42649 instances of class java.lang.String 38841 instances of class org.apache.fop.fo.properties.CommonMarginBlock 38839 instances of class org.apache.fop.datatypes.LengthBase 38839 instances of class org.apache.fop.fo.properties.PercentLength 38838 instances of class org.apache.fop.fo.flow.Block 38836 instances of class org.apache.fop.fo.flow.TableCell 9710 instances of class org.apache.fop.fo.flow.TableRow 5021 instances of class java.lang.Integer 4329 instances of class java.util.HashMap$Entry 1521 instances of class java.lang.Class 787 instances of class java.awt.Color 667 instances of class java.util.HashMap$Entry[] 658 instances of class java.util.HashMap 330 instances of class java.util.Hashtable$Entry 201 instances of class java.util.WeakHashMap$Entry 181 instances of class org.apache.fop.fo.properties.EnumProperty 160 instances of class java.util.concurrent.ConcurrentHashMap$HashEntry[] 160 instances of class java.util.concurrent.ConcurrentHashMap$Segment 160 instances of class java.util.concurrent.locks.ReentrantLock$NonfairSync 127 instances of class java.lang.String[] 116 instances of class java.util.LinkedHashMap$Entry 110 instances of class byte[] ... Richard, very good stuff. I am trying to make sense of the numbers. Let me paraphrase the data: A table with 4 columns and 9710 rows. Each table cell contains a block and there is also a block around the table. This gives us (roughly) the 87394 CommonBorderPaddingBackground and CondLengthProperty[] instances. We also have an ArrayList (the property list) per formatting object and each ArrayList is backed by an Object[]. What are the 81632 instances of class char[]? I assume this is the text in the table cells. But why are there more than twice as many as there are table cells? Your summary also shows 126237 instances of class org.xml.sax.helpers.LocatorImpl. I believe the only purpose of these helpers if for providing location information in error messages. Looks like a fairly expensive feature. As you can see, there's still a lot of value in getting the compound properties reusable - which I still intend on doing. This won't make it possible to handle arbitary documents, but will at least raise the bar somewhat. Based on your above figures reusing identical compound property instances would certainly be a very useful improvement. A possible next step would be to reuse identical property lists. Especially in documents with lots of identically formatted table cells this would further reduce the memory footprint. I'm also currently reading through Knuth's Digital Typography. Can anyone point out any sections I should pay particular attention to w.r.t. FOP's usage, Regards, Richard Manuel
Re: FOP Memory issues (fwd from fop-users)
Perhaps the flyweight pattern, described by the GoF, may be of use to whoever is going to look into an implementation strategy. http://en.wikipedia.org/wiki/Flyweight_pattern gives a decent example. amin On 1/9/07, Manuel Mall [EMAIL PROTECTED] wrote: On Tuesday 09 January 2007 18:13, [EMAIL PROTECTED] wrote: Andreas L Delmelle writes: If I remember correctly, that was precisely the problem, since Cliff's report consists of one giant table. It's supposed to look like one uninterrupted flow, so figuring out where the page-sequences should end is next to impossible... (or IOW: sorting that out kind of defeats the purpose of using a formatter to compute the page-breaks) :/ That's exactly the same problem I ran up against. Hence my investigations into properties and memory consumption. On my current setup (as per the last patch under 41044) and using my current 16MB fo file, the object counts are as follows: 349576 instances of class org.apache.fop.fo.properties.CondLengthProperty 145647 instances of class org.apache.fop.fo.properties.KeepProperty 126237 instances of class org.xml.sax.helpers.LocatorImpl 116521 instances of class org.apache.fop.fo.properties.SpaceProperty 87814 instances of class java.lang.Object[] 87458 instances of class java.util.ArrayList 87394 instances of class org.apache.fop.fo.properties.CommonBorderPaddingBackground 87394 instances of class org.apache.fop.fo.properties.CommonBorderPaddingBackground$BorderInfo [] 87394 instances of class org.apache.fop.fo.properties.CondLengthProperty[] 81632 instances of class char[] 48548 instances of class org.apache.fop.fo.properties.LengthRangeProperty 42649 instances of class java.lang.String 38841 instances of class org.apache.fop.fo.properties.CommonMarginBlock 38839 instances of class org.apache.fop.datatypes.LengthBase 38839 instances of class org.apache.fop.fo.properties.PercentLength 38838 instances of class org.apache.fop.fo.flow.Block 38836 instances of class org.apache.fop.fo.flow.TableCell 9710 instances of class org.apache.fop.fo.flow.TableRow 5021 instances of class java.lang.Integer 4329 instances of class java.util.HashMap$Entry 1521 instances of class java.lang.Class 787 instances of class java.awt.Color 667 instances of class java.util.HashMap$Entry[] 658 instances of class java.util.HashMap 330 instances of class java.util.Hashtable$Entry 201 instances of class java.util.WeakHashMap$Entry 181 instances of class org.apache.fop.fo.properties.EnumProperty 160 instances of class java.util.concurrent.ConcurrentHashMap$HashEntry[] 160 instances of class java.util.concurrent.ConcurrentHashMap$Segment 160 instances of class java.util.concurrent.locks.ReentrantLock$NonfairSync 127 instances of class java.lang.String[] 116 instances of class java.util.LinkedHashMap$Entry 110 instances of class byte[] ... Richard, very good stuff. I am trying to make sense of the numbers. Let me paraphrase the data: A table with 4 columns and 9710 rows. Each table cell contains a block and there is also a block around the table. This gives us (roughly) the 87394 CommonBorderPaddingBackground and CondLengthProperty[] instances. We also have an ArrayList (the property list) per formatting object and each ArrayList is backed by an Object[]. What are the 81632 instances of class char[]? I assume this is the text in the table cells. But why are there more than twice as many as there are table cells? Your summary also shows 126237 instances of class org.xml.sax.helpers.LocatorImpl. I believe the only purpose of these helpers if for providing location information in error messages. Looks like a fairly expensive feature. As you can see, there's still a lot of value in getting the compound properties reusable - which I still intend on doing. This won't make it possible to handle arbitary documents, but will at least raise the bar somewhat. Based on your above figures reusing identical compound property instances would certainly be a very useful improvement. A possible next step would be to reuse identical property lists. Especially in documents with lots of identically formatted table cells this would further reduce the memory footprint. I'm also currently reading through Knuth's Digital Typography. Can anyone point out any sections I should pay particular attention to w.r.t. FOP's usage, Regards, Richard Manuel
Re: FOP Memory issues (fwd from fop-users)
Richard a écrit : snip/ I'm also currently reading through Knuth's Digital Typography. Can anyone point out any sections I should pay particular attention to w.r.t. FOP's usage, (Digital Typography caught my eyes. I'll try to respond to the rest later.) Chapter 3, Breaking Paragraphs Into Lines, is definitely THE chapter to read. The first 2 chapters are dealing with font rendering, which goes too far for us --but it may be interesting for your personal culture. I haven't read the rest but it seems very TeX-oriented to me. Maybe interesting, but obsoleted by the current font technology (Type1, OpenType) anyway. I've considered to describe the linebreaking algorithm in a less cryptic manner (variable names of more than one letter, mainly) on the Wiki [1], but have never had the time to do it. If you ever want to do that, that would be very valuable I think! [1] http://wiki.apache.org/xmlgraphics-fop/KnuthsModel Vincent
Re: FOP Memory issues (fwd from fop-users)
On Tue, 2007-01-09 at 16:52 +0100, Vincent Hennebert wrote: Richard a écrit : snip/ I'm also currently reading through Knuth's Digital Typography. Can anyone point out any sections I should pay particular attention to w.r.t. FOP's usage, (Digital Typography caught my eyes. I'll try to respond to the rest later.) Chapter 3, Breaking Paragraphs Into Lines, is definitely THE chapter to read. The first 2 chapters are dealing with font rendering, which goes too far for us --but it may be interesting for your personal culture. I haven't read the rest but it seems very TeX-oriented to me. Maybe interesting, but obsoleted by the current font technology (Type1, OpenType) anyway. I've considered to describe the linebreaking algorithm in a less cryptic manner (variable names of more than one letter, mainly) on the Wiki [1], but have never had the time to do it. If you ever want to do that, that would be very valuable I think! [1] http://wiki.apache.org/xmlgraphics-fop/KnuthsModel Vincent And I thought about describing Knuth and Plass' algorithm in a less cryptic manner, and from a slightly different point of view. http://defoe.sourceforge.net/folio/knuth-plass.html N.I.H. Peter
Re: FOP Memory issues (fwd from fop-users)
On Jan 9, 2007, at 14:11, Manuel Mall wrote: snip / What are the 81632 instances of class char[]? I assume this is the text in the table cells. But why are there more than twice as many as there are table cells? Hehe, see my little remark about the TextLM... In its initialize() method (I think, will check later), the FOText's char array is copied (System.arraycopy()). That means there are currently two nearly identical char arrays alive for each text node in the page sequence :( Cheers, Andreas
Re: FOP Memory issues (fwd from fop-users)
On Jan 9, 2007, at 18:46, Andreas L Delmelle wrote: On Jan 9, 2007, at 14:11, Manuel Mall wrote: snip / What are the 81632 instances of class char[]? I assume this is the text in the table cells. But why are there more than twice as many as there are table cells? Hehe, see my little remark about the TextLM... In its initialize() method (I think, will check later), the FOText's char array is copied (System.arraycopy()). Sorry, my bad. Just realized that Richard's data indicates that the layout stage hasn't even been reached at that point...? But that also makes the picture somewhat more dramatic, because That means there are currently two nearly identical char arrays alive for each text node in the page sequence :( this is till true, nonetheless, and that means that at layout stage there will be twice as many (16+ instances) White-space nodes? Could also be an effect of the white-space collapsing. Long shot, but theoretically, 'white-space-handling' in FOText means 'replace FOText.ca with a copy minus a few characters', if the originals weren't GC'ed at the time of the snapshot... A possible next step would be to reuse identical property lists. Especially in documents with lots of identically formatted table cells this would further reduce the memory footprint. Property lists themselves are no longer alive in the snapshot, it seems. I don't suppose they are that much of a problem. They are meant to be in scope for only a very brief interval (best case: from PropertyListMaker.make() until FObj.bind()) Except for the child-parent relation, there are virtually no strong references to PropertyLists. One in the MainFOHandler(), to pass down as a parentPropertyList to the others. Notable exceptions are - a table-column's PropertyList: could be needed to resolve calls to from-table-column() - a retrieve-marker's PropertyList: is needed for deferred resolution of the marker properties Possible improvement would be a subtype that collapses the tree of PropertyLists at a given point. Right now, every PropertyList holds a reference to its parent, and with that, to the whole ancestry... Cheers, Andreas
Re: FOP Memory issues (fwd from fop-users)
On Jan 5, 2007, at 16:20, Jeremias Maerki wrote: Adding page breaks will not be enough, BTW. But you already noticed that. FOP can currently only release memory at the end of a page- sequence. So instead of creating page-breaks, try to restart a new page- sequence. The memory usage should drop considerably. If I remember correctly, that was precisely the problem, since Cliff's report consists of one giant table. It's supposed to look like one uninterrupted flow, so figuring out where the page-sequences should end is next to impossible... (or IOW: sorting that out kind of defeats the purpose of using a formatter to compute the page-breaks) :/ There's also a little class (CachedRenderPagesModel) which could theoretically be used instead of the default RenderPagesModel. It allows to temporarily off-load rendered pages to disk if they can't be rendered right away. But this is not actively tested and does not help with the memory consumption of the FO tree which probably is representing the largest part in your case. The one way I see that FOP is ever going to get close to resolving the issue of arbitrarily sized page-sequences, is if the overall processing is 'slightly' modified (quoted, since it seems like only a small change, but it would still be quite some work for one man). The redesign was ultimately meant to modularize FOP. Now the fo-tree and the layoutengine have been successfully extracted into separate modules, seems like it's time to revisit the way they work together. Currently, we have two monolithic modules performing their respective operations in sequential order. One module (layout) can't start until the other (fo-tree) has reached a critical boundary (FOEventHandler.endPageSequence()), and vice versa, the fo-tree can't continue until layout for a page-sequence has finished. Very briefly put: the key would be to implement AreaTreeHandler.endBlock(). Use that event to start/resume the layout-loop (ideally this loop should run in a separate thread, so there would be real performance- boosts on MP-systems), and use endPageSequence() instead only to perform one finishing pass over the whole sequence. Such a change could bring us closer to enhancing FOP in other areas as well. Multiple endBlock() events each offer an opportunity for the PageSequenceLM to record available IPD changes, take into account footnotes/floats associated with a block etc. Rough sketch: At the very first endBlock() the parent FlowLM and PageSequenceLM are instantiated, and the first block-sequence is created. The breaker is run a first time, storing the resulting active nodes. Every next occurrence of the event, the ancestor LMs and a set of active nodes are already present, a sequence for the current block is added, and the breaker is run again... As such, the page-breaking algorithm would run incrementally, performing multiple passes over the same block-sequences. As you can see from the simplistic sketch, I'm still a bit unsure about the specifics, but if all goes well, in the most straightforward cases, some LMs can begin adding their areas long before the physical end-of-page-sequence is reached. If that also implies they can release the reference to their FO (and instruct the FOTree to release the reference as well via FONode.removeChild()), large parts of the FOTree can be garbage-collected much sooner than they are now. Think of the content of block-containers, non-marker parts of the static-content, table-headers/-footers. Even large text-blocks: note that the TextLM currently creates a copy of the corresponding FOText's char array, while the original happily occupies the same amount of memory. The overall changes would be far from trivial though, AFAICT, but I'd love to see some more brainstorming in this direction. Biggest problem, IIC, is that AbstractBreaker.doLayout() currently performs everything in one go. Cheers, Andreas