Thanks for the submission. I have opened bug 60416 [1] to work on this and committed your patch.
Discussion can take place either on the bug or on the [email protected] mailing list. I have bcc'd [email protected] out of the email responses. [1] https://bz.apache.org/bugzilla/show_bug.cgi?id=60416 On Thu, Nov 24, 2016 at 3:24 AM, Dmitry Katsubo <[email protected]> wrote: > Dear POI community, > > I am trying to use SXSSFWorkbook (streaming version of workbook) to generate > XLSX spreadsheet with many rows and hyperlinks in the cells. It turned out > that effective streaming of such document is not possible, as hyperlinks are > accumulated in XSSFSheet#hyperlinks (see [1]). It's clear why this is > necessary: <hyperlinks> XML element goes after <sheetData> in <worksheet>, so > they need to be somehow preserved until all data rows are written. However it > turned out that each hyperlink is 2KB size, while the "useful" data (the > string link itself) is about 300 bytes (75% memory overhead). For document > with 60000 hyperlinks, the memory used is 132MB (see attached > poi-hyperlink-dump.png). The most of memory is consumed by Cur$Locations > which has 7 arrays, 32 elements each, consuming 1KB in total (see attached > poi-hyperlink-dump-detailed.png). The stack trace is the following: > > Thread [main] (Suspended (breakpoint at line 493 in > org.apache.xmlbeans.impl.store.Cur$Locations)) > > org.apache.xmlbeans.impl.store.Cur$Locations.<init>(org.apache.xmlbeans.impl.store.Locale) > line: 493 > > org.apache.xmlbeans.impl.store.Locale.<init>(org.apache.xmlbeans.SchemaTypeLoader, > org.apache.xmlbeans.XmlOptions) line: 169 > > org.apache.xmlbeans.impl.store.Locale.getLocale(org.apache.xmlbeans.SchemaTypeLoader, > org.apache.xmlbeans.XmlOptions) line: 241 > > org.apache.xmlbeans.impl.store.Locale.newInstance(org.apache.xmlbeans.SchemaTypeLoader, > org.apache.xmlbeans.SchemaType, org.apache.xmlbeans.XmlOptions) line: 592 > > org.apache.xmlbeans.impl.schema.SchemaTypeLoaderImpl(org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase).newInstance(org.apache.xmlbeans.SchemaType, > org.apache.xmlbeans.XmlOptions) line: 198 > > org.apache.poi.POIXMLTypeLoader.newInstance(org.apache.xmlbeans.SchemaType, > org.apache.xmlbeans.XmlOptions) line: 84 > > org.openxmlformats.schemas.spreadsheetml.x2006.main.CTHyperlink$Factory.newInstance() > line: not available > > org.apache.poi.xssf.usermodel.XSSFHyperlink.<init>(org.apache.poi.common.usermodel.HyperlinkType) > line: 60 > > org.apache.poi.xssf.usermodel.XSSFCreationHelper.createHyperlink(org.apache.poi.common.usermodel.HyperlinkType) > line: 78 > > org.apache.poi.xssf.streaming.SXSSFCreationHelper.createHyperlink(org.apache.poi.common.usermodel.HyperlinkType) > line: 83 > > By looking into tacktrace one can see that there is chance to optimize memory > in Locale#getLocale() by taking advantage of > options.hasOption(USE_SAME_LOCALE). However this option is not documented – > what could be the side effects? > > Another opportunity is to refactor XSSFHyperlink so that it does not create > CTHyperlink immediately but postpones that until that is really needed > (SXSSFWorkbook#write() is called), more specifically, when > XSSFHyperlink#getCTHyperlink() is called. However SXSSFCell#setHyperlink() > misuses this method [3] in a sense that it should use API of XSSFHyperlink > the same way as XSSFCell#setHyperlink() does (see SXSSFCell.java.patch). > > Summarizing above I would say that the following could be improved right now: > > * Document USE_SAME_LOCALE option. Provide a description of what could go > wrong if this options is used, especially when used in conjunction with > UNSYNCHRONIZED. Also the option assumes that Locale object could be somehow > instantiated but it's constructor is private and one cannot do something like > POIXMLTypeLoader.DEFAULT_XML_OPTIONS.put(Locale.USE_SAME_LOCALE, > Locale.getLocale(XmlBeans.getContextTypeLoader(), > POIXMLTypeLoader.DEFAULT_XML_OPTIONS)); > * Apply minor correction to documentation [4]: > there are still things that still may consume a large amount of memory > based on which features you are using, e.g. merged regions, comments, ... > there are still things that still may consume a large amount of memory > based on which features you are using, e.g. merged regions, hyperlinks, > comments, ... > * Apply the attached SXSSFCell.java.patch. > > More advanced improvement of SXSSF would be to dump all problematic objects > (regions, hyperlinks, comments, ...) into temporary files and then merge them > into destination XSLX spreadsheet – then it would be a real streaming. > > POI v3.15. Links are to POI 3.12, but there are no conceptual differences. > > [1] > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.poi/poi-ooxml/3.12/org/apache/poi/xssf/usermodel/XSSFSheet.java#137 > [2] > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.xmlbeans/xmlbeans/2.4.0/org/apache/xmlbeans/impl/store/Locale.java#213 > [3] > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.poi/poi-ooxml/3.12/org/apache/poi/xssf/streaming/SXSSFCell.java#603 > [4] https://poi.apache.org/spreadsheet/how-to.html#sxssf > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
