[Bug 59432] createName is slow when there are many Names (10000)

bugzilla Tue, 17 May 2016 07:35:56 -0700

https://bz.apache.org/bugzilla/show_bug.cgi?id=59432


--- Comment #1 from Javen O'Neal <[email protected]> ---
Adding a name requires checking the name manager for existing names to avoid
defining the same name at the same scope (my guess is this would result in a
corrupt workbook). POI uses a naïve implementation, shown in comment 0, which
requires O(n) time for a naïve implementation. We could perform this check in
O(1) using a hash table with a hashable tuple (scope, name) as the key. A less
elegant, inferior solution that runs in O(1) uses nested hash tables: the first
layer having scope (sheet name or global) keys and the second layer having name
keys (inner and outer key could swapped). This comes at the cost of higher
memory consumption and increasing the complexity of the code (and therefore
higher chance for bugs).

Given the code from comment 0, I'm not surprised that adding N names is slow,
as it is performing O(N²) operations.

Here's what you could do:
1) Provide a patch with a speed-optimized implementation with 100% test
coverage.
2) Provide a patch with a non-validating version of setNameName (probably
called setNameNameUnsafe) [1]
3) access the CT* classes yourself, either with introspection, subclassing, or
forking POI, which gives you direct access to the CTName data structure. This
would complicate upgrading POI in the future.

[1] Relevant discussion on dev@poi mailing list
http://apache-poi.1045710.n5.nabble.com/Preventing-corrupt-workbooks-td5722973.html

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[Bug 59432] createName is slow when there are many Names (10000)

Reply via email to