[sc-dev] An experiment with incremental saving in Calc

Niklas Nebel Fri, 27 Feb 2009 02:47:07 -0800

(posted to sc and performance lists; follow-ups should be on theperformance list)


Hi everyone,

For a while, there has been the idea to speed up the saving of smallchanges to large spreadsheet documents by generating only the XMLelements for the changed cells.

I did a little experiment to find out how much time this can reallysave. It's not a complete implementation, just a test to see what'spossible. This is what I changed:

- Loading of the file is unchanged. No additional information is kept inmemory after the file is loaded.

- What I keep in memory is the list of changes, as long as onlysupported changes were made. For this test, only input of numbers intocells that weren't empty before is supported. If any other changes aremade, the list is discarded and the normal save code is used. Forsimplicity, it is assumed that the input doesn't change the cell'sformat. Changed formula results are ignored (in a real implementation,they have to be tracked, too).

- When the file is saved, and the content.xml stream is generated, firstthe old content.xml is opened and parsed for the positions of theaffected cell elements (see below for details). Then, the output isgenerated by copying the non-affected parts and inserting a new simplecell element instead of each affected cell.


- The styles.xml stream is copied in unchanged form.

To find the affected cell elements in the XML stream quickly, expat's Cinterface is used, searching for qualified element names table:table,table:table-row, table:table-cell and table:covered-table-cell. Becausethis is just an optimization, I can assume that our own namespaceprefixes are used. Parsing is stopped when the last affected cell was found.

For testing, I used a large, simple file with 500000 cells, only textand numbers.

In CPU time (measured with callgrind), normal saving takes 20.1 billioncycles. Incremental saving with three values changed at the top of thefile takes 6.3 billion, and with three values changed at the bottom ittakes 11.4 billion cycles. Note that a real implementation wouldprobably need some additional checks that might increase the time again.

In exchange for these saved CPU cycles, there is a bit of additionalfile access to read the old streams again. This isn't much, because theycan be read from the compressed file. Reliably measuring the total timeisn't easy, but quick tests with a larger file show the time forincremental saving to be around 50% of normal for changes at the top,and 70% for changes at the bottom. On other machines, this might betotally different.


What next?

As mentioned above, the immediate goal of this was only to get thenumbers of what is possible. Before making a real implementation ofthis, we should see where we can get by improving the normal saving,because that would benefit all usages, not only specific, limitedmodifications.

For reference, the code of this experiment is in the CWS "calcincsave".I also put this text into the wiki, athttp://wiki.services.openoffice.org/wiki/Calc/Performance/Incremental_Saving.


Niklas

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[sc-dev] An experiment with incremental saving in Calc

Reply via email to