the purpose of calling CTSheetData#setRowArray is to ensure that the array of CTRow beans in CTSheetData is ordered. The point is that we don't always need this operation, in many cases rows are already ordered and re-assigning them to CTSheetData is an unnecessary expensive operation.

Compare two use cases:

case 1: rows are written in increasing order

XSSFSheet sheet = workbook.createSheet();
sheet.createRow(1);
sheet.createRow(2);
sheet.createRow(3);
workbook.write(out); //no need to re-order rows before writing


case 2: row are created in random order

XSSFSheet sheet = workbook.createSheet();
sheet.createRow(3);
sheet.createRow(2);
sheet.createRow(1);
workbook.write(out); //need to re-order rows before writing

In the first case the call of CTSheetData#setRowArray is extra because the state of CTSheetData match the logical model. In the second case we do need to call CTSheetData#setRowArray to ensure that rows are written in ascending order.

So, the performance of XSSFSheet#write depends on how rows were added: if rows are added in strict ascending order then XSSFSheet#write is fast. If the order of rows is random or rows were shifted then XSSFSheet#write involves CTSheetData#setRowArray and becomes more expensive.

I committed this improvement in r947542. Please try the latest build from trunk.

Below are two my tests, the only difference is that in the first test rows are written in ascending order and in the second test in descending.

    public static void main(String[] args) throws Exception {
        long t1 = System.currentTimeMillis();

        XSSFWorkbook wb = new XSSFWorkbook();
        XSSFSheet sh = wb.createSheet();
        for (int i = 0; i < Short.MAX_VALUE; i++) {
            XSSFRow row = sh.createRow(i);
            for (int j = 0; j < 5; j++) {
                XSSFCell cell = row.createCell(j);
                cell.setCellValue(i*j);
            }
        }
        long t2 = System.currentTimeMillis();
        System.out.println("generate: " + (t2-t1) + " ms");

FileOutputStream out = new FileOutputStream(new File("/temp/rows.xlsx"));
        wb.write(out);
        out.close();
        long t3 = System.currentTimeMillis();
        System.out.println("write: " + (t3-t2) + " ms");
    }


generate: 5488 ms
write: 2507 ms

    public static void main(String[] args) throws Exception {
        long t1 = System.currentTimeMillis();

        XSSFWorkbook wb = new XSSFWorkbook();
        XSSFSheet sh = wb.createSheet();
        for (int i = Short.MAX_VALUE; i > 0; i--) {
            XSSFRow row = sh.createRow(i);
            for (int j = 0; j < 5; j++) {
                XSSFCell cell = row.createCell(j);
                cell.setCellValue(i*j);
            }
        }
        long t2 = System.currentTimeMillis();
        System.out.println("generate: " + (t2-t1) + " ms");

FileOutputStream out = new FileOutputStream(new File("/temp/rows.xlsx"));
        wb.write(out); //will involve CTSheetData#setRowArray
        out.close();
        long t3 = System.currentTimeMillis();
        System.out.println("write: " + (t3-t2) + " ms");
    }

generate: 9291 ms
write: 94888 ms

The first test is almost 40X faster!

Regards,
Yegor



Continuing to chip away at CTSheetDataImpl's performance issues.
However, the problematic code is unlikely to be changed, I have a discussion 
going on the XMLBeans website, and as it turns out that code is most likely not 
going to change, or will not change easily.


With that in mind, I started taking a close look at the POI side of things and 
very specifically the problematic code from a POI perspective.


At approximately line 2368 of XSSFSheet.java

the following line is the suspect line:
I have modified the code to break 1 line into 3  the slow line is the last one.

         CTRow[] lctRow = new CTRow[rArray.size()];
        lctRow = rArray.toArray(lctRow);
         sheetData.setRowArray(lctRow);



My Question is:  in looking at XSSFSheet.java
are there any assumption that we can make about the state of sheetData (A 
CTSheetData object)
that would allow us to either overwrite the default setRowArray(lctRow) method,
or side step this call all together by calling a method that is not generic, 
but instead much
more targeted to the function and process that is currently going on.








Here is the full call in context...Take from my modified version of 
XSSFSheet.java


     protected void write(OutputStream out) throws IOException {

         if(worksheet.getColsArray().length == 1) {
             CTCols col = worksheet.getColsArray(0);
             CTCol[] cols = col.getColArray();
             if(cols.length == 0) {
                 worksheet.setColsArray(null);
             }
         }

         // Now re-generate our CTHyperlinks, if needed
         if(hyperlinks.size()>  0) {
             if(worksheet.getHyperlinks() == null) {
                 worksheet.addNewHyperlinks();
             }
             CTHyperlink[] ctHls = new CTHyperlink[hyperlinks.size()];
             for(int i=0; i<ctHls.length; i++) {
                 // If our sheet has hyperlinks, have them add
                 //  any relationships that they might need
                 XSSFHyperlink hyperlink = hyperlinks.get(i);
                 hyperlink.generateRelationIfNeeded(getPackagePart());
                 // Now grab their underling object
                 ctHls[i] = hyperlink.getCTHyperlink();
             }
             worksheet.getHyperlinks().setHyperlinkArray(ctHls);
         }

         CTSheetData sheetData = worksheet.getSheetData();
         ArrayList<CTRow>  rArray = new ArrayList<CTRow>(rows.size());
         for(XSSFRow row : rows.values()){
             row.onDocumentWrite();
             rArray.add(row.getCTRow());
         }
                long startTimeSheetData = System.currentTimeMillis();
                CTRow[] lctRow = new CTRow[rArray.size()];
                lctRow = rArray.toArray(lctRow);
         sheetData.setRowArray(lctRow);
                System.out.println("Sheet Time : " + 
(System.currentTimeMillis() - startTimeSheetData));


         XmlOptions xmlOptions = new XmlOptions(DEFAULT_XML_OPTIONS);
         xmlOptions.setSaveSyntheticDocumentElement(new 
QName(CTWorksheet.type.getName().getNamespaceURI(), "worksheet"));
         Map<String, String>  map = new HashMap<String, String>();
         map.put(STRelationshipId.type.getName().getNamespaceURI(), "r");
         xmlOptions.setSaveSuggestedPrefixes(map);
                long startTimeWorkSheetSave = System.currentTimeMillis();

         worksheet.save(out, xmlOptions);
                System.out.println("Worksheet save Time : " + 
(System.currentTimeMillis() - startTimeWorkSheetSave));

     }



Best Regards
Bryce Alcock


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to