Martin, I will give my feedback later, when I have time to play with the code. At first glance it looks like a very promising improvement, thank you very much. I agree that ensureRowOrdering() is expensive and not needed at all when you create rows in increasing order. If we get rid of it it will be just great!
Regards, Yegor On Fri, Aug 5, 2011 at 2:45 PM, Martin Studer <[email protected]> wrote: > Dear Developers, > > > > I've recently encountered a performance issue when writing large xlsx files > (~ 10^5 rows and about 10 columns). Looking at the POI mailing list I found > a discussion that was going in May/June 2010: > > http://apache-poi.1045710.n5.nabble.com/Performance-Question-with-CTSheetDat > aImpl-java-td2335065.html > > > > The thread describes some improvements that were implemented to check > whether the CTRows have to be reordered or not > (XSSFSheet#ensureRowOrdering). However, from what I've found, already the > checks that have to be performed to see if rows have to be reordered are > quite expensive - especially the calls to sheetData.getRowArray. I > understand that adapting the XMLBeans side of things is not straight forward > / practical. However, the thread also describes an alternative approach > formulated by Yegor which is "to improve XSSFSheet#createRow so that the > array of CTRow beans is kept in a sorted state". To me this seems like a > very good idea. Looking at the current codebase it seems like this has not > been implemented so far, probably due to other side-effects such a change > would cause? Given that the sheet-internal _rows object already is an > ordered TreeMap, Yegor's alternative approach could probably be implemented > like this: > > > > public XSSFRow createRow(int rownum) { > > CTRow ctRow; > > XSSFRow prev = _rows.get(rownum); > > if(prev != null){ > > ctRow = prev.getCTRow(); > > ctRow.set(CTRow.Factory.newInstance()); > > } else { > > if(rownum > _rows.lastKey()) { > > // we can append the new row at the end > > ctRow = > worksheet.getSheetData().addNewRow(); > > } > > else { > > // random access case ... > > // get number of rows where row index < > rownum > > // --> this tells us where our row should go > > int idx = _rows.headMap(rownum).size(); > > ctRow = > worksheet.getSheetData().insertNewRow(idx); > > } > > } > > XSSFRow r = new XSSFRow(ctRow, this); > > r.setRowNum(rownum); > > _rows.put(rownum, r); > > return r; > > } > > > > > > In case the row does not exist yet, we check whether the row to be inserted > is the "highest" row (= has the largest row number). In this case we can > simply append the row to the CTSheetData via the addNewRow method. Otherwise > we check how many rows there are which have a smaller index. Given the > invariant that the CTRows are ordered, this number tells us where we have to > insert the new row (via the insertNewRow method). > > > > What is your opinion on that? Do you think this would work without causing > other issues? Note, I have not been testing this as I'm still trying to > figure out how set up the checked out POI code with Eclipse (if somebody has > some pointers on how to do that I would be very greatful). > > > > Have a good weekend, > > Martin > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
