[ https://issues.apache.org/jira/browse/BATIK-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Erich Schubert updated BATIK-1183: ---------------------------------- Description: In ELKI, we use Batik for scatterplots. Marker symbols are generated as <symbol> tag, and then a <use> at the individual locations. This is nice for post-editing (because the symbols can be changed in a single place), but performance of this approach is pretty bad (up to the point where I am considering to kick out Batik, and try something else). When analyzing performance bottlenecks, I noticed the following things: 1. A substantial amount of time (way too much) goes into listener list management (yes, I want support for dynamic changes; so I do need listeners). It seems that for every <use>, several listeners are added? 2. String.intern is a major performance factor. I understand that we need to intern strings, but we need to avoid redoing it as often. 3. When a <symbol> is used, it gets cloned. With thousands of <use> tags, this leads to a substantial cost. In particular, because every string will be interned again for every usage. (org.apache.batik.bridge.SVGUseElementBridge#buildCompositeGraphicsNode calls 'importNode') Attached is a file that shows the performance bottleneck; in particular when interactions are enabled. I have tried to improve some of these things in my speedup branch: https://github.com/kno10/batik/tree/fixesAndSpeed In this branch: - the namespace SVGConstants.SVG_NAMESPACE_URI is recognized and the call to String.intern() is avoided. This is the default namespace for SVG, and the constant will point to the interned version. - the custom "Hashtable" has been removed, and replaced with a type-safe HashMap<> (which should actually be faster) - The listener list management is now much simpler (and more efficient, as some of the functionality wasn't ever used anywhere). But I could not tackle reducing the amount of listeners and the cloning, as I am not deep enough into Batik internals. I understand they are meant to propagate changes to the symbol to all the copies, but maybe we can instead have one shared listener on the <symbol> tag for all the <use> tags, not one listener per <use> tag? Without using '<symbol>' and '<use>', performance is much better. It makes the file harder to edit, and twice as large. :-( was: In ELKI, we use Batik for Scatterplots. Marker symbols are generated as <symbol> tag, and then a <use> at the individual locations. This is nice for post-editing (because the symbols can be changed in a single place), but performance of this approach is pretty bad (up to the point where I am considering to kick out Batik, and try something else). When analyzing performance bottlenecks, I noticed the following things: 1. A substantial amount of time (way too much) goes into listener list management (yes, I want support for dynamic changes; so I do need listeners). It seems that for every <use>, several listeners are added? 2. String.intern is a major performance factor. I understand that we need to intern strings, but we need to avoid redoing it as often. 3. When a <symbol> is used, it gets cloned. With thousands of <use> tags, this leads to a substantial cost. In particular, because every string will be interned again for every usage. (org.apache.batik.bridge.SVGUseElementBridge#buildCompositeGraphicsNode calls 'importNode') I have tried to improve some of these things in my speedup branch: https://github.com/kno10/batik/tree/fixesAndSpeed In particular, SVGConstants.SVG_NAMESPACE_URI is recognized and not interned; as we expect to see this namespace very often; and I replaced the listener list management with something much simpler (and more efficient, as some of the functionality wasn't ever used). I could not tackle the amount of listeners and the cloning, as I am not deep enough into Batik internals. > Performance of <use> and <symbol> > --------------------------------- > > Key: BATIK-1183 > URL: https://issues.apache.org/jira/browse/BATIK-1183 > Project: Batik > Issue Type: Improvement > Components: Bridge > Affects Versions: trunk > Reporter: Erich Schubert > Labels: performance > Attachments: scatter.svg.gz > > > In ELKI, we use Batik for scatterplots. > Marker symbols are generated as <symbol> tag, and then a <use> at the > individual locations. This is nice for post-editing (because the symbols can > be changed in a single place), but performance of this approach is pretty bad > (up to the point where I am considering to kick out Batik, and try something > else). > When analyzing performance bottlenecks, I noticed the following things: > 1. A substantial amount of time (way too much) goes into listener list > management (yes, I want support for dynamic changes; so I do need listeners). > It seems that for every <use>, several listeners are added? > 2. String.intern is a major performance factor. I understand that we need to > intern strings, but we need to avoid redoing it as often. > 3. When a <symbol> is used, it gets cloned. With thousands of <use> tags, > this leads to a substantial cost. In particular, because every string will be > interned again for every usage. > (org.apache.batik.bridge.SVGUseElementBridge#buildCompositeGraphicsNode calls > 'importNode') > Attached is a file that shows the performance bottleneck; in particular when > interactions are enabled. > I have tried to improve some of these things in my speedup branch: > https://github.com/kno10/batik/tree/fixesAndSpeed > In this branch: > - the namespace SVGConstants.SVG_NAMESPACE_URI is recognized and the call to > String.intern() is avoided. This is the default namespace for SVG, and the > constant will point to the interned version. > - the custom "Hashtable" has been removed, and replaced with a type-safe > HashMap<> (which should actually be faster) > - The listener list management is now much simpler (and more efficient, as > some of the functionality wasn't ever used anywhere). > But I could not tackle reducing the amount of listeners and the cloning, as I > am not deep enough into Batik internals. I understand they are meant to > propagate changes to the symbol to all the copies, but maybe we can instead > have one shared listener on the <symbol> tag for all the <use> tags, not one > listener per <use> tag? > Without using '<symbol>' and '<use>', performance is much better. It makes > the file harder to edit, and twice as large. :-( -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: batik-dev-unsubscr...@xmlgraphics.apache.org For additional commands, e-mail: batik-dev-h...@xmlgraphics.apache.org