https://issues.apache.org/bugzilla/show_bug.cgi?id=55611
Bug ID: 55611
Summary: Performance improvement (~7% up to ~27%) by adding a
cache to DateUtil.isADateFormat(int, String)
Product: POI
Version: 3.9
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P2
Component: HSSF
Assignee: [email protected]
Reporter: [email protected]
Created attachment 30894
--> https://issues.apache.org/bugzilla/attachment.cgi?id=30894&action=edit
Patch for poi-3.9
We found an easy way to improve POI's performance. The idea is to avoid
re-checking
in DateUtil.isADateFormat(int, String) if a given format string represents a
date format if the same string is passed multiple times.
This can be done safely by adding a single-static-entry cache and check if the
parameters did change from the previous call, and in case invalidate the cache.
Our attached patch first checks that the format and format index are the same
as in the previous call, otherwise execute the real check and store the
required data.
For example, when running Poi 3.9 on a small document (~40 KB) and on a larger
document (~13.5 MB), the patch reduces the running time
giving a speedup of ~7% in the first case and ~12% in the second case.
Additionally we executed an experiment using this patch with Tika 1.3.
We ran a test with a set of nine documents (~13.9MB) obtaining a 27% speedup.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]