Re: [PATCHES] poi as calculation engine

Javen O'Neal Tue, 30 May 2017 21:44:26 -0700

> How about we adopt a full lazy-evaluation approach for Formula objects?
I opened bug 61136 [1] and took a stab at full lazy evaluation [2].


Travis, would you be willing to test the memory consumption and
execution speed with these changes in HSSF and XSSF relative to trunk
and your patch (8ab388eb78)?

[1] https://bz.apache.org/bugzilla/show_bug.cgi?id=61136
[2] https://bz.apache.org/bugzilla/attachment.cgi?id=35014&action=diff

On Tue, May 30, 2017 at 8:49 PM, Javen O'Neal <[email protected]> wrote:
> How about we adopt a full lazy-evaluation approach for Formula objects?
> Either _byteEncoding and _encodedTokenLen should be provided or
> _ptgTokens should be provided. Only when the other is needed should
> Ptg.readTokens or Ptg.serializeTokens be called.
>
> As I understand it, reading and serializing tokens is pretty
> expensive, and eager evaluation is expensive if the results are never
> used.
>
> There should be:
>> private Formula(final byte[] byteEncoding, final int encodedTokenLen);
>> private Formula(final Ptg[] cachedTokens);
> I don't think a third constructor is necessary with the current code
> and enough lazy evaluation, but here's a 3rd constructor
>> private Formula(final byte[] byteEncoding, final int encodedTokenLen, final 
>> Ptg[] cachedTokens);
>
> I think we're less likely to rewrite org.apache.poi.hssf.model with
> SpreadsheetVersion code since it would encourage incorrect usage (the
> mailing list would explode). However, if you can get to the bottom of
> why HSSF formula evaluation is faster with some profiling (perhaps
> it's the expensive xmlbean reading and writing that's slowing XSSF
> down), then we could go that avenue.
> 1. Find and fix the largest contributors to the overall formula
> evaluation time for your XSSF and HSSF test cases.
> 2. Create a GenericSSEvaluationWorkbook that stores sheets, cells, and
> rows in plain old java objects without the underlying HSSF bytestream
> or XSSF/SXSSF xml data structures that are needed for serialization.
> This wouldn't need to be a writeable workbook. This could plug into
> the current formula evaluation code.
> 3. Rewrite some of the XSSF classes that wrap an xmlbean to fully read
> their data from an xmlbean, and only recreate the bean when writing
> out. We want to go this direction with XSSF as it would make POI
> faster, use less memory, and make it easier to transition to a
> different XML library. Maybe I'm overstating, but long term, this
> would make it easier to merge HSSF, XSSF, SXSSF, and other SS
> interfaces, enabling format-agnostic classes that could convert
> between xls and xlsx.
> 4. Other ideas?

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PATCHES] poi as calculation engine

Reply via email to