https://bz.apache.org/bugzilla/show_bug.cgi?id=61841

--- Comment #8 from Luca Martini <lucamart...@tagetik.com> ---
(In reply to Greg Woolsey from comment #5)
> Changes in r1817252
> 

Thank you very much Greg.


> Of this remaining time, about 2/3 is taken up in the formula evaluation
> caching and tracking mechanism.  Bypassing it for null cells causes test
> failures, which shows it is necessary, but relatively expensive.  It appears
> to try to optimize and minimize the "empty cell" rectangular regions it
> holds. but assumes processing by row then column.  That may be a memory/time
> optimization we want to consider allowing additional strategies for.
> 
> Note that this shortcut logic doesn't change the result of any methods, only
> avoids busywork that didn't apply to the "nonexistent cell" cases.
> 
> This doesn't optimize VLOOKUP directly, but is about 70% improvement
> sufficient?

I think so. That's more or less the fix I had in mind.

> 
> Changing the VLOOKUP code itself is actually significantly more complex,
> because POI handles sheets by row internally, and columns are second-class
> constructs.  There is no easy way to determine the last row with data in a
> column other than iterating over all defined rows.  With these
> optimizations, the extra iterations should fail fast.

I know, and I think that with current state of the data structure, iterating
over every defined row could be worse than your current solution.
Here we are still on POI 3.x, but it should not be difficult to integrate your
changes in our forked version.

For me the bug is considered as resolved. I still do not change its status
because others have still pending comments.

Best regards,
    Luca

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to