https://bz.apache.org/bugzilla/show_bug.cgi?id=66418
Bug ID: 66418
Summary: Performance/Scalability issue in XSSFSheet.groupRow
Product: POI
Version: 5.2.2-FINAL
Hardware: All
OS: Mac OS X 10.1
Status: NEW
Severity: normal
Priority: P2
Component: XSSF
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---
We've noticed a scalability issue when there are a large number of grouped
rows. We call XSSFSheet.groupRow() for each group. That implementation sets the
outline level on the row, but then it also calls
setSheetFormatPrOutlineLevelRow(), which looks at EVERY XSSFRow / CTRow to get
its outline level. So if I have, say, 100,000 rows, and I call groupRow()
10,000 times, it will be reading a CTRow outline level 1 billion times. This
can be quite slow.
I can think of a couple of different approaches to fixing this; would
appreciate some input and maybe I can try to provide a patch.
Option 1: Provide another groupRow method that does NOT calculate the
sheet-level outline level, for example:
public void groupRow(int fromRow, int toRow, boolean
autoCalculateSheetOutlineLevel)
(If autoCalculateSheetOutlineLevel is true, it will behave as it does today; if
false, it will skip execution of setSheetFormatPrOutlineLevelRow()).
Then in XSSFSheet, make this method public:
private void setSheetFormatPrOutlineLevelRow()
or expose another public method that wraps it.
Option 2:
Create a new method on XSSFSheet that takes a list of from/to row indexes, so
that it can set a lot of outline levels, and then call
setSheetFormatPrOutlineLevelRow() just once at the end. Maybe something like
public void groupRows(List<Interval> rowGroups)
where Interval would be a small class to encapsulate the "from" and "to" row
indexes.
Thoughts?
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]