Re: SQL backend performance
Donald Allen donaldcal...@gmail.com writes: Some good news: Doing this the easy way first, I did a little manual pc sampling. I ran gnucash (today's trunk) under gdb, let it get to the point where it begins to load my data from postgresql, and periodically ctrl-c'd in gdb and copied the interrupted location and a backtrace to an emacs buffer. In the interest of saving space in this message, rather than just copying the result into the message (I assume that attachments are not permitted in emails to the gnucash mailing lists), I'll tell you that I interrupted execution 7 times and the first 6 were identical to this: 0x70471187 in g_list_append () from /usr/lib/libglib-2.0.so.0 (gdb) bt #0 0x70471187 in g_list_append () from /usr/lib/libglib-2.0.so.0 [snip] (gdb) bt #0 0x70471187 in g_list_append () from /usr/lib/libglib-2.0.so.0 #1 0x7fffe86d55c0 in load_splits_for_tx_list (be=0xa5a300, stmt=value optimized out) at gnc-transaction-sql.c:238 [snip] Yeah, we should almost NEVER use g_list_append!! I still see 206 references to g_list_append in the source tree. We might want to take a look at all of those to see if they are ever handling lists of size 10. -derek -- Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory Member, MIT Student Information Processing Board (SIPB) URL: http://web.mit.edu/warlord/PP-ASEL-IA N1NWH warl...@mit.eduPGP key available ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: Code formatting Re: SQL backend performance
Zitat von Phil Longstaff plongst...@rogers.com: http://lists.gnucash.org/pipermail/gnucash-devel/2009-August/026121.html and my commit r18675 recently. I didn't apply this to the full source tree so far in order not to destroy some people's diffs which are still waiting to be applied... I think the directory you're mentioning wasn't run through astyle recently, so that would explain this issue. I think the best way to handle this formatting is to create an options file (perhaps src/c-code.style) which is then used everywhere: astyle --options=~/gnucash/src/c-code.style *.[ch] Christian, could you create such a central file with the options you are using? Thanks for the idea. Indeed I could create such a file; however, the astyle program unfortunately behaves slightly differently between versions (which is why I mentioned usage of 1.24 in r18675). Hence, the options itself are not enough; you would have to use the exact same version of astyle as well every time. The options file for our current style would contain these four lines: indent-spaces=4 brackets=break pad-oper pad-header Does anyone know of a scheme formatter? No. Currently we just use emacs' scheme indentation. Regards, Christian ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: Code formatting Re: SQL backend performance
On Thu, 2010-02-25 at 09:49 +0100, Christian Stimming wrote: Zitat von Phil Longstaff plongst...@rogers.com: http://lists.gnucash.org/pipermail/gnucash-devel/2009-August/026121.html and my commit r18675 recently. I didn't apply this to the full source tree so far in order not to destroy some people's diffs which are still waiting to be applied... I think the directory you're mentioning wasn't run through astyle recently, so that would explain this issue. I think the best way to handle this formatting is to create an options file (perhaps src/c-code.style) which is then used everywhere: astyle --options=~/gnucash/src/c-code.style *.[ch] Christian, could you create such a central file with the options you are using? Thanks for the idea. Indeed I could create such a file; however, the astyle program unfortunately behaves slightly differently between versions (which is why I mentioned usage of 1.24 in r18675). Hence, the options itself are not enough; you would have to use the exact same version of astyle as well every time. The options file for our current style would contain these four lines: indent-spaces=4 brackets=break pad-oper pad-header Hmmm... I have Ubuntu 9.10 with astyle 1.22 which doesn't even have pad-header. Does anyone know of a scheme formatter? No. Currently we just use emacs' scheme indentation. Can that be run from the command line (e.g. emacs -x scheme-indentation file.scm) with standard options so that anyone creating or modifying an scm file can check it's indented before commit? Phil ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: SQL backend performance
On Tue, Feb 23, 2010 at 11:40 AM, Geert Janssens janssens-ge...@telenet.be wrote: On Tuesday 23 February 2010, Donald Allen wrote: On Tue, Feb 23, 2010 at 9:15 AM, Geert Janssens Your assumptions on how things work are correct. And I noticed this performance decrease as well. There is one difference between the xml and the sql backends that may influence this (at least in part): the sql backend writes lots of debug information to gnucash.trace at present. I don't know how much impact this has, I haven't tested without debug information, but if we disable the debug information before the 2.4 release, it will surely narrow the gap. I'm seeing trace files on the order of .5 Mb. As I mentioned earlier, saving my xml file takes about 2 seconds. It's about 2.5 Mb (over 20 Mb uncompressed) and the 2 seconds includes the time to compress it. Writing the trace file is not nearly as hard a job and the periodic writes should be to the buffer cache on any reasonable machine. So I'll guess (again) that the gap-narrowing won't amount to much. I hope I'm wrong :-) I think true measurements will be the only way to find out what causes delays where. Of course. I spent a big chunk of my career doing performance analysis on various bits of complicated software and learned very young (the hard way) that if you think you know how your software behaves and where the time is going, you are probably wrong. Measurement, done correctly, is the only way to get to the truth reliably. I sometimes had to insist on measurement by people who worked for me who were as cocky (and wrong) as I was when I was young :-) But until the measurements are done, there's no harm in doing some educated guessing, so long as the guessing doesn't replace the measuring. If you are frequently right, it can help you set your measurement priorities. If you are frequently wrong, it reminds you that you aren't too good at modeling the behavior of software in your head. /Don But it's clear there's still room for performance improvements. Geert ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: SQL backend performance
Donald Allen donaldcal...@gmail.com writes: I think true measurements will be the only way to find out what causes delays where. Of course. I spent a big chunk of my career doing performance analysis on various bits of complicated software and learned very young (the hard way) that if you think you know how your software behaves and where the time is going, you are probably wrong. Measurement, done correctly, is the only way to get to the truth reliably. I sometimes had to insist on measurement by people who worked for me who were as cocky (and wrong) as I was when I was young :-) But until the measurements are done, there's no harm in doing some educated guessing, so long as the guessing doesn't replace the measuring. If you are frequently right, it can help you set your measurement priorities. If you are frequently wrong, it reminds you that you aren't too good at modeling the behavior of software in your head. For what it's worth, the old Postgres backend was dog slow too. I certainly encourage you to perform profiling to determine where our bottlenecks are. -derek -- Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory Member, MIT Student Information Processing Board (SIPB) URL: http://web.mit.edu/warlord/PP-ASEL-IA N1NWH warl...@mit.eduPGP key available ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: SQL backend performance
On Wed, 2010-02-24 at 09:59 -0500, Derek Atkins wrote: Donald Allen donaldcal...@gmail.com writes: I think true measurements will be the only way to find out what causes delays where. Of course. I spent a big chunk of my career doing performance analysis on various bits of complicated software and learned very young (the hard way) that if you think you know how your software behaves and where the time is going, you are probably wrong. Measurement, done correctly, is the only way to get to the truth reliably. I sometimes had to insist on measurement by people who worked for me who were as cocky (and wrong) as I was when I was young :-) But until the measurements are done, there's no harm in doing some educated guessing, so long as the guessing doesn't replace the measuring. If you are frequently right, it can help you set your measurement priorities. If you are frequently wrong, it reminds you that you aren't too good at modeling the behavior of software in your head. For what it's worth, the old Postgres backend was dog slow too. I certainly encourage you to perform profiling to determine where our bottlenecks are. Another thing that I haven't done too much of is trying to add extra indexes or optimize queries. All SQL statements are logged to gnucash.trace. Feel free to add indexes and/or change queries to improve performance. In general, one major problem is that certain areas of the code just assume that the data is loaded. Until we remove those assumptions or provide alternatives, it seemed the safer route to just load all data at start time. Phil ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: SQL backend performance
On Wed, Feb 24, 2010 at 10:32 AM, Phil Longstaff plongst...@rogers.com wrote: On Wed, 2010-02-24 at 09:59 -0500, Derek Atkins wrote: Donald Allen donaldcal...@gmail.com writes: I think true measurements will be the only way to find out what causes delays where. Of course. I spent a big chunk of my career doing performance analysis on various bits of complicated software and learned very young (the hard way) that if you think you know how your software behaves and where the time is going, you are probably wrong. Measurement, done correctly, is the only way to get to the truth reliably. I sometimes had to insist on measurement by people who worked for me who were as cocky (and wrong) as I was when I was young :-) But until the measurements are done, there's no harm in doing some educated guessing, so long as the guessing doesn't replace the measuring. If you are frequently right, it can help you set your measurement priorities. If you are frequently wrong, it reminds you that you aren't too good at modeling the behavior of software in your head. For what it's worth, the old Postgres backend was dog slow too. I certainly encourage you to perform profiling to determine where our bottlenecks are. Another thing that I haven't done too much of is trying to add extra indexes or optimize queries. All SQL statements are logged to gnucash.trace. Feel free to add indexes and/or change queries to improve performance. In general, one major problem is that certain areas of the code just assume that the data is loaded. Until we remove those assumptions or provide alternatives, it seemed the safer route to just load all data at start time. I have one quick data point for you: I ran 'top' while loading my data a few times from Postgresql. 'top' is not exactly a surgical measurement tool, but it can get you started in the right direction by letting you know what the bottleneck resource is, e.g., I/O-limited, cpu-limited, etc. What I'm seeing is that for the vast majority of the time while the data is loading, gnucash-bin is using 100% of a processor (2 core system). A postgres server process shows up a distant second occasionally, and then there's a brief period at the end of the loading where there's a burst of cpu activity by the postgres server process. But most of the time is spent waiting while the gnucash-bin process computes like crazy. This is 99% user-mode time. Now the trick is to get more specific about where the time is going. I will offer one of my usual guesses: I don't *think* that missing indices (resulting in full-table scans) would produce behavior like this, because I believe the query processing is done on the server side, so I'm postulating that in that situation, you would see high cpu utilization by the server, which is not the case. If I'm right, then this might be good news, if the bulk of the time is being spent in actual gnucash code (which can be improved once you understand the problem), as opposed, say, to libpq code. Anyway, as we discussed earlier, my guessing is not a substitute for actual measurement. /Don Phil ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: SQL backend performance
Some good news: Doing this the easy way first, I did a little manual pc sampling. I ran gnucash (today's trunk) under gdb, let it get to the point where it begins to load my data from postgresql, and periodically ctrl-c'd in gdb and copied the interrupted location and a backtrace to an emacs buffer. In the interest of saving space in this message, rather than just copying the result into the message (I assume that attachments are not permitted in emails to the gnucash mailing lists), I'll tell you that I interrupted execution 7 times and the first 6 were identical to this: 0x70471187 in g_list_append () from /usr/lib/libglib-2.0.so.0 (gdb) bt #0 0x70471187 in g_list_append () from /usr/lib/libglib-2.0.so.0 #1 0x7fffe86d5625 in query_transactions (be=0xa58150, stmt=value optimized out) at gnc-transaction-sql.c:385 #2 0x7fffe86d5919 in gnc_sql_transaction_load_all_tx (be=0xa58150) at gnc-transaction-sql.c:768 #3 0x70464643 in g_hash_table_foreach () from /usr/lib/libglib-2.0.so.0 #4 0x77bc69e3 in qof_object_foreach_backend (backend_name=value optimized out, cb=0x7fffe86cd220 initial_load_cb, user_data=value optimized out) at qofobject.c:373 #5 0x7fffe86cd1f1 in gnc_sql_load (be=0xa58150, book=0x9cfc80, loadType=value optimized out) at gnc-backend-sql.c:193 #6 0x7fffe88e172b in gnc_dbi_load (qbe=0xa58150, book=0x9cfc80, loadType=LOAD_TYPE_INITIAL_LOAD) at gnc-backend-dbi.c:800 #7 0x77bccf5c in qof_session_load (session=0xa34ec0, percentage_func=0x765a2d10 gnc_window_show_progress) at qofsession.c:1320 #8 0x7656dd57 in gnc_post_file_open (filename=value optimized out) at gnc-file.c:758 #9 0x0040368c in inner_main (closure=value optimized out, argc=value optimized out, argv=value optimized out) at gnucash-bin.c:521 #10 0x701aa1cf in invoke_main_func () from /usr/lib/libguile.so.17 #11 0x7018056a in c_body () from /usr/lib/libguile.so.17 #12 0x701e6cbd in scm_c_catch () from /usr/lib/libguile.so.17 #13 0x70180a17 in scm_i_with_continuation_barrier () from /usr/lib/libguile.so.17 #14 0x70180ab0 in scm_c_with_continuation_barrier () from /usr/lib/libguile.so.17 #15 0x701e5c14 in scm_i_with_guile_and_parent () from /usr/lib/libguile.so.17 #16 0x701aa185 in scm_boot_guile () from /usr/lib/libguile.so.17 #17 0x004032f6 in main (argc=1, argv=0x7fffe798) at gnucash-bin.c:672 I got myself a copy of the glib source code. In the comments preceding g_list_append, this appears: * Note that g_list_append() has to traverse the entire list * to find the end, which is inefficient when adding multiple * elements. A common idiom to avoid the inefficiency is to prepend * the elements and reverse the list when all elements have been added. This is good advice that is familiar to any of us who have done a lot of Lisp/Scheme work. So I changed query_transactions to call g_list_prepend inside the transaction loop and reverse the list after the loop complete. Now my data loads in about 30 seconds, whereas it was 45 or so previously (my checking account register looks ok, so I don't think I broke anything). So I repeated the process, periodically interrupting data loading in gdb. Now I started to see things like this: (gdb) bt #0 0x70471187 in g_list_append () from /usr/lib/libglib-2.0.so.0 #1 0x7fffe86d55c0 in load_splits_for_tx_list (be=0xa5a300, stmt=value optimized out) at gnc-transaction-sql.c:238 #2 query_transactions (be=0xa5a300, stmt=value optimized out) at gnc-transaction-sql.c:395 #3 0x7fffe86d5969 in gnc_sql_transaction_load_all_tx (be=0xa5a300) at gnc-transaction-sql.c:769 #4 0x70464643 in g_hash_table_foreach () from /usr/lib/libglib-2.0.so.0 #5 0x77bc69e3 in qof_object_foreach_backend (backend_name=value optimized out, cb=0x7fffe86cd270 initial_load_cb, user_data=value optimized out) at qofobject.c:373 #6 0x7fffe86cd241 in gnc_sql_load (be=0xa5a300, book=0x9d2480, loadType=value optimized out) at gnc-backend-sql.c:193 #7 0x7fffe88e172b in gnc_dbi_load (qbe=0xa5a300, book=0x9d2480, loadType=LOAD_TYPE_INITIAL_LOAD) at gnc-backend-dbi.c:800 #8 0x77bccf5c in qof_session_load (session=0xa36b80, percentage_func=0x765a2d10 gnc_window_show_progress) at qofsession.c:1320 #9 0x7656dd57 in gnc_post_file_open (filename=value optimized out) at gnc-file.c:758 #10 0x0040368c in inner_main (closure=value optimized out, argc=value optimized out, argv=value optimized out) at gnucash-bin.c:521 #11 0x701aa1cf in invoke_main_func () from /usr/lib/libguile.so.17 #12 0x7018056a in c_body () from /usr/lib/libguile.so.17 #13 0x701e6cbd in scm_c_catch () from /usr/lib/libguile.so.17 #14 0x70180a17 in scm_i_with_continuation_barrier () from /usr/lib/libguile.so.17 #15 0x70180ab0 in scm_c_with_continuation_barrier () from /usr/lib/libguile.so.17 #16 0x701e5c14 in
Re: SQL backend performance
So I applied the same treatment to load_splits_for_tx_list, substituting g_list_prepend for g_list_append inside the split-fetching loop and reversing the list on completion of the loop. I rebuilt and tried again and now my data loads in about 9 seconds, approximately the same as the xml file and about a factor of 5 improvement! I haven't tested it yet, but I believe we'll see a nice improvement with sqlite3 as well, since this code is not specific to either database. This is extremely good news! Thanks a lot for looking into this and reminding anyone here to read the docs of the container types we're using... I guess the same (g_list_prepend vs. append) might show up in the XML backend as well... As for the patch and indentation: Indeed we discussed and agreed on some common indentation last summer, and we agreed on using the tool astyle to reformat the code as decided. See http://lists.gnucash.org/pipermail/gnucash-devel/2009-August/026121.html and my commit r18675 recently. I didn't apply this to the full source tree so far in order not to destroy some people's diffs which are still waiting to be applied... I think the directory you're mentioning wasn't run through astyle recently, so that would explain this issue. Regards, Christian ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: SQL backend performance
On Wed, 2010-02-24 at 14:37 -0500, Donald Allen wrote: Great! I'll apply the patch. There are probable other places which would benefit from this. There might also be places where the order is unimportant so that the list doesn't need to be reversed. BTW, I found the indentation/formatting of this (and other) files to be very odd, and it rendered the code pretty unreadable for me. What you see above is a replication of the changes I made to the original file. In order to do this work, though, I had to format the two procedures involved (in emacs with meta-ctrl-q) so I could read them (I could have cleaned them up some more to eliminate unnecessary pointy-brackets, but didn't bother). It was that code that I actually tested, so there could be a typo in the above. I attribute great importance to code readability and have some strong opinions about how C code should look. I saw something on gnucash-devel earlier that gave me the impression there is something afoot to clean up the code. Is that so? If yes, I'd like to put my $.02 into that pot. Yes, there is a code formatting effort being made using a standard set of options to a code formatter. They don't match my ideal preferences, but given N people, there will N different tastes. Phil ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Code formatting Re: SQL backend performance
On Wed, 2010-02-24 at 20:50 +0100, Christian Stimming wrote: So I applied the same treatment to load_splits_for_tx_list, substituting g_list_prepend for g_list_append inside the split-fetching loop and reversing the list on completion of the loop. I rebuilt and tried again and now my data loads in about 9 seconds, approximately the same as the xml file and about a factor of 5 improvement! I haven't tested it yet, but I believe we'll see a nice improvement with sqlite3 as well, since this code is not specific to either database. This is extremely good news! Thanks a lot for looking into this and reminding anyone here to read the docs of the container types we're using... I guess the same (g_list_prepend vs. append) might show up in the XML backend as well... As for the patch and indentation: Indeed we discussed and agreed on some common indentation last summer, and we agreed on using the tool astyle to reformat the code as decided. See http://lists.gnucash.org/pipermail/gnucash-devel/2009-August/026121.html and my commit r18675 recently. I didn't apply this to the full source tree so far in order not to destroy some people's diffs which are still waiting to be applied... I think the directory you're mentioning wasn't run through astyle recently, so that would explain this issue. I think the best way to handle this formatting is to create an options file (perhaps src/c-code.style) which is then used everywhere: astyle --options=~/gnucash/src/c-code.style *.[ch] Christian, could you create such a central file with the options you are using? Does anyone know of a scheme formatter? Phil ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: Code formatting Re: SQL backend performance
On Wed, 2010-02-24 at 22:18 +0100, Geert Janssens wrote: On Wednesday 24 February 2010, Phil Longstaff wrote: On Wed, 2010-02-24 at 20:50 +0100, Christian Stimming wrote: Christian, could you create such a central file with the options you are using? Agreed on the idea of an options file, but it should IMHO not be in src. I'd prefer to see it in util. util sounds good. Phil ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: SQL backend performance
On Tuesday 23 February 2010, Donald Allen wrote: As I've mentioned in other posts, I have a pretty large gnucash datafile -- more than 20 Mb uncompressed. I've been testing the SQL backend and I'm concerned about the performance, particularly startup performance. I've been doing this testing on an inexpensive little HP desktop machine, dual-core 2.8 Ghz AMD Athlon, 2 Gb memory, 300 Gb 7200 rpm SATA drive, Arch Linux, 2.6.32 kernel. It may have been cheap, but it's damned fast (I've been in the computer business since 1964 until I retired last Oct. and the cost-performance of today's hardware just boggles my mind, especially when I think about what we put up with 20 or 30 years ago; you haven't lived until you've tried getting your work done *and* stayed sane while sitting in front of a VT100 talking to an overloaded Vax 780 running 4.1BSD; motto: It was hard to build, it ought to be hard to use). From my gnucash xml file, I've created sqlite3 and postgresql databases containing the same data. Here are the average data-load timings, in seconds, from just a couple of tests per storage method (this is from the point gnucash says reading file until it's up): xml 9.10 sqlite3 45.41 postgresql45.46 My mental model, which may be wrong, of what the SQL backend is doing is that it is reading the entire database at startup into memory, as opposed to retrieving the data as-needed (which is the way I'd guess gnucash+database would be architected if it was being written from scratch; I'll guess further that given the file-based history of gnucash, it's existing architecture precluded using the incremental approach when adding the database backend). I'm guessing this because of the amount of time it takes to start up when pointed either at postgresql or sqlite3 versions of the same data. I am further assuming that, with the SQL backend, as the user adds new things (accounts, transactions, etc.), new objects get inserted into the database and modified objects get updated, on the spot. I'm guessing this because as I make changes, the 'save' button remains grayed-out. So the primary advantage of the database backend is that file saves are not necessary. But, at least in my case, I am paying about a 36 second price at startup to avoid the file saves. File saves on my machine take about 2 seconds, much faster than reading the file, probably because the data is being written to the buffer cache. So I'd need to do 18 file saves during a gnucash session to be worse off (kept waiting) with the xml file than with the data in either flavor of database. And that assumes that I am always waiting for file saves to complete, which is not always the case (I frequently do them after completing a chunk of work, e.g., entering an investment transaction from a statement, and the file-save happens while I turn my attention to the paperwork to figure out what I need to do next). While I didn't do a lot of timed tests and I've used the word guess an awful lot above (corrections to what I've said above from someone who actually knows what (s)he is talking about are perfectly welcome), the numbers confirm what I already knew from many more untimed tests, that at least in my case, startup takes an unacceptably long time. If this is the way the released version performs and my back-of-the-envelope analysis of the tradeoff is correct, I don't think I'd use the database stuff and just continue with the xml file. But if you think I've missed something here, please feel free. Your assumptions on how things work are correct. And I noticed this performance decrease as well. There is one difference between the xml and the sql backends that may influence this (at least in part): the sql backend writes lots of debug information to gnucash.trace at present. I don't know how much impact this has, I haven't tested without debug information, but if we disable the debug information before the 2.4 release, it will surely narrow the gap. In the future this may still be improved such that data is only queried for when needed, but this requires a lot of internal modifications. This would not have been possible for the next stable release. Geert ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: SQL backend performance
On Tue, Feb 23, 2010 at 9:15 AM, Geert Janssens janssens-ge...@telenet.be wrote: On Tuesday 23 February 2010, Donald Allen wrote: As I've mentioned in other posts, I have a pretty large gnucash datafile -- more than 20 Mb uncompressed. I've been testing the SQL backend and I'm concerned about the performance, particularly startup performance. I've been doing this testing on an inexpensive little HP desktop machine, dual-core 2.8 Ghz AMD Athlon, 2 Gb memory, 300 Gb 7200 rpm SATA drive, Arch Linux, 2.6.32 kernel. It may have been cheap, but it's damned fast (I've been in the computer business since 1964 until I retired last Oct. and the cost-performance of today's hardware just boggles my mind, especially when I think about what we put up with 20 or 30 years ago; you haven't lived until you've tried getting your work done *and* stayed sane while sitting in front of a VT100 talking to an overloaded Vax 780 running 4.1BSD; motto: It was hard to build, it ought to be hard to use). From my gnucash xml file, I've created sqlite3 and postgresql databases containing the same data. Here are the average data-load timings, in seconds, from just a couple of tests per storage method (this is from the point gnucash says reading file until it's up): xml 9.10 sqlite3 45.41 postgresql 45.46 My mental model, which may be wrong, of what the SQL backend is doing is that it is reading the entire database at startup into memory, as opposed to retrieving the data as-needed (which is the way I'd guess gnucash+database would be architected if it was being written from scratch; I'll guess further that given the file-based history of gnucash, it's existing architecture precluded using the incremental approach when adding the database backend). I'm guessing this because of the amount of time it takes to start up when pointed either at postgresql or sqlite3 versions of the same data. I am further assuming that, with the SQL backend, as the user adds new things (accounts, transactions, etc.), new objects get inserted into the database and modified objects get updated, on the spot. I'm guessing this because as I make changes, the 'save' button remains grayed-out. So the primary advantage of the database backend is that file saves are not necessary. But, at least in my case, I am paying about a 36 second price at startup to avoid the file saves. File saves on my machine take about 2 seconds, much faster than reading the file, probably because the data is being written to the buffer cache. So I'd need to do 18 file saves during a gnucash session to be worse off (kept waiting) with the xml file than with the data in either flavor of database. And that assumes that I am always waiting for file saves to complete, which is not always the case (I frequently do them after completing a chunk of work, e.g., entering an investment transaction from a statement, and the file-save happens while I turn my attention to the paperwork to figure out what I need to do next). While I didn't do a lot of timed tests and I've used the word guess an awful lot above (corrections to what I've said above from someone who actually knows what (s)he is talking about are perfectly welcome), the numbers confirm what I already knew from many more untimed tests, that at least in my case, startup takes an unacceptably long time. If this is the way the released version performs and my back-of-the-envelope analysis of the tradeoff is correct, I don't think I'd use the database stuff and just continue with the xml file. But if you think I've missed something here, please feel free. Your assumptions on how things work are correct. And I noticed this performance decrease as well. There is one difference between the xml and the sql backends that may influence this (at least in part): the sql backend writes lots of debug information to gnucash.trace at present. I don't know how much impact this has, I haven't tested without debug information, but if we disable the debug information before the 2.4 release, it will surely narrow the gap. I'm seeing trace files on the order of .5 Mb. As I mentioned earlier, saving my xml file takes about 2 seconds. It's about 2.5 Mb (over 20 Mb uncompressed) and the 2 seconds includes the time to compress it. Writing the trace file is not nearly as hard a job and the periodic writes should be to the buffer cache on any reasonable machine. So I'll guess (again) that the gap-narrowing won't amount to much. I hope I'm wrong :-) In the future this may still be improved such that data is only queried for when needed, but this requires a lot of internal modifications. This would not have been possible for the next stable release. I understand why this was done as it was; I said so in my previous email. If I had been running this project, I'm quite sure I'd have done the same thing, so I'm not at all critical. I also note that I'm a bit of an extreme case. I've been
Re: SQL backend performance
On Tuesday 23 February 2010, Donald Allen wrote: On Tue, Feb 23, 2010 at 9:15 AM, Geert Janssens Your assumptions on how things work are correct. And I noticed this performance decrease as well. There is one difference between the xml and the sql backends that may influence this (at least in part): the sql backend writes lots of debug information to gnucash.trace at present. I don't know how much impact this has, I haven't tested without debug information, but if we disable the debug information before the 2.4 release, it will surely narrow the gap. I'm seeing trace files on the order of .5 Mb. As I mentioned earlier, saving my xml file takes about 2 seconds. It's about 2.5 Mb (over 20 Mb uncompressed) and the 2 seconds includes the time to compress it. Writing the trace file is not nearly as hard a job and the periodic writes should be to the buffer cache on any reasonable machine. So I'll guess (again) that the gap-narrowing won't amount to much. I hope I'm wrong :-) I think true measurements will be the only way to find out what causes delays where. But it's clear there's still room for performance improvements. Geert ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel