> How about we adopt a full lazy-evaluation approach for Formula objects? I opened bug 61136 [1] and took a stab at full lazy evaluation [2].
Travis, would you be willing to test the memory consumption and execution speed with these changes in HSSF and XSSF relative to trunk and your patch (8ab388eb78)? [1] https://bz.apache.org/bugzilla/show_bug.cgi?id=61136 [2] https://bz.apache.org/bugzilla/attachment.cgi?id=35014&action=diff On Tue, May 30, 2017 at 8:49 PM, Javen O'Neal <[email protected]> wrote: > How about we adopt a full lazy-evaluation approach for Formula objects? > Either _byteEncoding and _encodedTokenLen should be provided or > _ptgTokens should be provided. Only when the other is needed should > Ptg.readTokens or Ptg.serializeTokens be called. > > As I understand it, reading and serializing tokens is pretty > expensive, and eager evaluation is expensive if the results are never > used. > > There should be: >> private Formula(final byte[] byteEncoding, final int encodedTokenLen); >> private Formula(final Ptg[] cachedTokens); > I don't think a third constructor is necessary with the current code > and enough lazy evaluation, but here's a 3rd constructor >> private Formula(final byte[] byteEncoding, final int encodedTokenLen, final >> Ptg[] cachedTokens); > > I think we're less likely to rewrite org.apache.poi.hssf.model with > SpreadsheetVersion code since it would encourage incorrect usage (the > mailing list would explode). However, if you can get to the bottom of > why HSSF formula evaluation is faster with some profiling (perhaps > it's the expensive xmlbean reading and writing that's slowing XSSF > down), then we could go that avenue. > 1. Find and fix the largest contributors to the overall formula > evaluation time for your XSSF and HSSF test cases. > 2. Create a GenericSSEvaluationWorkbook that stores sheets, cells, and > rows in plain old java objects without the underlying HSSF bytestream > or XSSF/SXSSF xml data structures that are needed for serialization. > This wouldn't need to be a writeable workbook. This could plug into > the current formula evaluation code. > 3. Rewrite some of the XSSF classes that wrap an xmlbean to fully read > their data from an xmlbean, and only recreate the bean when writing > out. We want to go this direction with XSSF as it would make POI > faster, use less memory, and make it easier to transition to a > different XML library. Maybe I'm overstating, but long term, this > would make it easier to merge HSSF, XSSF, SXSSF, and other SS > interfaces, enabling format-agnostic classes that could convert > between xls and xlsx. > 4. Other ideas? --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
