Re: SQL backend performance

2010-03-02 Thread Derek Atkins
Donald Allen donaldcal...@gmail.com writes:

 Some good news:

 Doing this the easy way first, I did a little manual pc sampling. I
 ran gnucash (today's trunk) under gdb, let it get to the point where
 it begins to load my data from postgresql, and periodically ctrl-c'd
 in gdb and copied the interrupted location and a backtrace to an emacs
 buffer. In the interest of saving space in this message, rather than
 just copying the result into the message (I assume that attachments
 are not permitted in emails to the gnucash mailing lists), I'll tell
 you that I interrupted execution 7 times and the first 6 were
 identical to this:

 0x70471187 in g_list_append () from /usr/lib/libglib-2.0.so.0
 (gdb) bt
 #0  0x70471187 in g_list_append () from /usr/lib/libglib-2.0.so.0
[snip]

 (gdb) bt
 #0  0x70471187 in g_list_append () from /usr/lib/libglib-2.0.so.0
 #1  0x7fffe86d55c0 in load_splits_for_tx_list (be=0xa5a300,
 stmt=value optimized out) at gnc-transaction-sql.c:238
[snip]

Yeah, we should almost NEVER use g_list_append!!  I still see 206
references to g_list_append in the source tree.  We might want to take a
look at all of those to see if they are ever handling lists of size  10.

-derek
-- 
   Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
   Member, MIT Student Information Processing Board  (SIPB)
   URL: http://web.mit.edu/warlord/PP-ASEL-IA N1NWH
   warl...@mit.eduPGP key available
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: Code formatting Re: SQL backend performance

2010-02-25 Thread Christian Stimming

Zitat von Phil Longstaff plongst...@rogers.com:

http://lists.gnucash.org/pipermail/gnucash-devel/2009-August/026121.html and
my commit r18675 recently. I didn't apply this to the full source  
tree so far

in order not to destroy some people's diffs which are still waiting to be
applied... I think the directory you're mentioning wasn't run through astyle
recently, so that would explain this issue.


I think the best way to handle this formatting is to create an options
file (perhaps src/c-code.style) which is then used everywhere:

astyle --options=~/gnucash/src/c-code.style *.[ch]

Christian, could you create such a central file with the options you are
using?


Thanks for the idea. Indeed I could create such a file; however, the  
astyle program unfortunately behaves slightly differently between  
versions (which is why I mentioned usage of 1.24 in r18675). Hence,  
the options itself are not enough; you would have to use the exact  
same version of astyle as well every time.


The options file for our current style would contain these four lines:

indent-spaces=4
brackets=break
pad-oper
pad-header


Does anyone know of a scheme formatter?


No. Currently we just use emacs' scheme indentation.

Regards,

Christian

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: Code formatting Re: SQL backend performance

2010-02-25 Thread Phil Longstaff
On Thu, 2010-02-25 at 09:49 +0100, Christian Stimming wrote:
 Zitat von Phil Longstaff plongst...@rogers.com:
  http://lists.gnucash.org/pipermail/gnucash-devel/2009-August/026121.html 
  and
  my commit r18675 recently. I didn't apply this to the full source  
  tree so far
  in order not to destroy some people's diffs which are still waiting to be
  applied... I think the directory you're mentioning wasn't run through 
  astyle
  recently, so that would explain this issue.
 
  I think the best way to handle this formatting is to create an options
  file (perhaps src/c-code.style) which is then used everywhere:
 
  astyle --options=~/gnucash/src/c-code.style *.[ch]
 
  Christian, could you create such a central file with the options you are
  using?
 
 Thanks for the idea. Indeed I could create such a file; however, the  
 astyle program unfortunately behaves slightly differently between  
 versions (which is why I mentioned usage of 1.24 in r18675). Hence,  
 the options itself are not enough; you would have to use the exact  
 same version of astyle as well every time.
 
 The options file for our current style would contain these four lines:
 
 indent-spaces=4
 brackets=break
 pad-oper
 pad-header

Hmmm...   I have Ubuntu 9.10 with astyle 1.22 which doesn't even have
pad-header.

  Does anyone know of a scheme formatter?
 
 No. Currently we just use emacs' scheme indentation.

Can that be run from the command line (e.g. emacs -x scheme-indentation
file.scm) with standard options so that anyone creating or modifying an
scm file can check it's indented before commit?

Phil

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: SQL backend performance

2010-02-24 Thread Donald Allen
On Tue, Feb 23, 2010 at 11:40 AM, Geert Janssens
janssens-ge...@telenet.be wrote:
 On Tuesday 23 February 2010, Donald Allen wrote:
 On Tue, Feb 23, 2010 at 9:15 AM, Geert Janssens
  Your assumptions on how things work are correct.
 
  And I noticed this performance decrease as well.
 
  There is one difference between the xml and the sql backends that may
  influence this (at least in part): the sql backend writes lots of debug
  information to gnucash.trace at present. I don't know how much impact
  this has, I haven't tested without debug information, but if we disable
  the debug information before the 2.4 release, it will surely narrow the
  gap.

 I'm seeing trace files on the order of .5 Mb. As I mentioned earlier,
 saving my xml file takes about 2 seconds. It's about 2.5 Mb (over 20
 Mb uncompressed) and the 2 seconds includes the time to compress it.
 Writing the trace file is not nearly as hard a job and the periodic
 writes should be to the buffer cache on any reasonable machine. So
 I'll guess (again) that the gap-narrowing won't amount to much. I hope
 I'm wrong :-)

 I think true measurements will be the only way to find out what causes delays
 where.

Of course. I spent a big chunk of my career doing performance analysis
on various bits of complicated software and learned very young (the
hard way) that if you think you know how your software behaves and
where the time is going, you are probably wrong. Measurement, done
correctly, is the only way to get to the truth reliably. I sometimes
had to insist on measurement by people who worked for me who were as
cocky (and wrong) as I was when I was young :-)

But until the measurements are done, there's no harm in doing some
educated guessing, so long as the guessing doesn't replace the
measuring. If you are frequently right, it can help you set your
measurement priorities. If you are frequently wrong, it reminds you
that you aren't too good at modeling the behavior of software in your
head.

/Don

 But it's clear there's still room for performance improvements.

 Geert

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: SQL backend performance

2010-02-24 Thread Derek Atkins
Donald Allen donaldcal...@gmail.com writes:

 I think true measurements will be the only way to find out what causes delays
 where.

 Of course. I spent a big chunk of my career doing performance analysis
 on various bits of complicated software and learned very young (the
 hard way) that if you think you know how your software behaves and
 where the time is going, you are probably wrong. Measurement, done
 correctly, is the only way to get to the truth reliably. I sometimes
 had to insist on measurement by people who worked for me who were as
 cocky (and wrong) as I was when I was young :-)

 But until the measurements are done, there's no harm in doing some
 educated guessing, so long as the guessing doesn't replace the
 measuring. If you are frequently right, it can help you set your
 measurement priorities. If you are frequently wrong, it reminds you
 that you aren't too good at modeling the behavior of software in your
 head.

For what it's worth, the old Postgres backend was dog slow too.

I certainly encourage you to perform profiling to determine where our
bottlenecks are.

-derek
-- 
   Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
   Member, MIT Student Information Processing Board  (SIPB)
   URL: http://web.mit.edu/warlord/PP-ASEL-IA N1NWH
   warl...@mit.eduPGP key available
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: SQL backend performance

2010-02-24 Thread Phil Longstaff
On Wed, 2010-02-24 at 09:59 -0500, Derek Atkins wrote:
 Donald Allen donaldcal...@gmail.com writes:
 
  I think true measurements will be the only way to find out what causes 
  delays
  where.
 
  Of course. I spent a big chunk of my career doing performance analysis
  on various bits of complicated software and learned very young (the
  hard way) that if you think you know how your software behaves and
  where the time is going, you are probably wrong. Measurement, done
  correctly, is the only way to get to the truth reliably. I sometimes
  had to insist on measurement by people who worked for me who were as
  cocky (and wrong) as I was when I was young :-)
 
  But until the measurements are done, there's no harm in doing some
  educated guessing, so long as the guessing doesn't replace the
  measuring. If you are frequently right, it can help you set your
  measurement priorities. If you are frequently wrong, it reminds you
  that you aren't too good at modeling the behavior of software in your
  head.
 
 For what it's worth, the old Postgres backend was dog slow too.
 
 I certainly encourage you to perform profiling to determine where our
 bottlenecks are.

Another thing that I haven't done too much of is trying to add extra
indexes or optimize queries.  All SQL statements are logged to
gnucash.trace.  Feel free to add indexes and/or change queries to
improve performance.

In general, one major problem is that certain areas of the code just
assume that the data is loaded.  Until we remove those assumptions or
provide alternatives, it seemed the safer route to just load all data at
start time.

Phil

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: SQL backend performance

2010-02-24 Thread Donald Allen
On Wed, Feb 24, 2010 at 10:32 AM, Phil Longstaff plongst...@rogers.com wrote:
 On Wed, 2010-02-24 at 09:59 -0500, Derek Atkins wrote:
 Donald Allen donaldcal...@gmail.com writes:

  I think true measurements will be the only way to find out what causes 
  delays
  where.
 
  Of course. I spent a big chunk of my career doing performance analysis
  on various bits of complicated software and learned very young (the
  hard way) that if you think you know how your software behaves and
  where the time is going, you are probably wrong. Measurement, done
  correctly, is the only way to get to the truth reliably. I sometimes
  had to insist on measurement by people who worked for me who were as
  cocky (and wrong) as I was when I was young :-)
 
  But until the measurements are done, there's no harm in doing some
  educated guessing, so long as the guessing doesn't replace the
  measuring. If you are frequently right, it can help you set your
  measurement priorities. If you are frequently wrong, it reminds you
  that you aren't too good at modeling the behavior of software in your
  head.

 For what it's worth, the old Postgres backend was dog slow too.

 I certainly encourage you to perform profiling to determine where our
 bottlenecks are.

 Another thing that I haven't done too much of is trying to add extra
 indexes or optimize queries.  All SQL statements are logged to
 gnucash.trace.  Feel free to add indexes and/or change queries to
 improve performance.

 In general, one major problem is that certain areas of the code just
 assume that the data is loaded.  Until we remove those assumptions or
 provide alternatives, it seemed the safer route to just load all data at
 start time.

I have one quick data point for you: I ran 'top' while loading my data
a few times from Postgresql. 'top' is not exactly a surgical
measurement tool, but it can get you started in the right direction by
letting you know what the bottleneck resource is, e.g., I/O-limited,
cpu-limited, etc. What I'm seeing is that for the vast majority of the
time while the data is loading, gnucash-bin is using 100% of a
processor (2 core system). A postgres server process shows up a
distant second occasionally, and then there's a brief period at the
end of the loading where there's a burst of cpu activity by the
postgres server process. But most of the time is spent waiting while
the gnucash-bin process computes like crazy. This is 99% user-mode
time. Now the trick is to get more specific about where the time is
going.

I will offer one of my usual guesses: I don't *think* that missing
indices (resulting in full-table scans) would produce behavior like
this, because I believe the query processing is done on the server
side, so I'm postulating that in that situation, you would see high
cpu utilization by the server, which is not the case. If I'm right,
then this might be good news, if the bulk of the time is being spent
in actual gnucash code (which can be improved once you understand the
problem), as opposed, say, to libpq code. Anyway, as we discussed
earlier, my guessing is not a substitute for actual measurement.

/Don



 Phil

 ___
 gnucash-devel mailing list
 gnucash-devel@gnucash.org
 https://lists.gnucash.org/mailman/listinfo/gnucash-devel

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: SQL backend performance

2010-02-24 Thread Donald Allen
Some good news:

Doing this the easy way first, I did a little manual pc sampling. I
ran gnucash (today's trunk) under gdb, let it get to the point where
it begins to load my data from postgresql, and periodically ctrl-c'd
in gdb and copied the interrupted location and a backtrace to an emacs
buffer. In the interest of saving space in this message, rather than
just copying the result into the message (I assume that attachments
are not permitted in emails to the gnucash mailing lists), I'll tell
you that I interrupted execution 7 times and the first 6 were
identical to this:

0x70471187 in g_list_append () from /usr/lib/libglib-2.0.so.0
(gdb) bt
#0  0x70471187 in g_list_append () from /usr/lib/libglib-2.0.so.0
#1  0x7fffe86d5625 in query_transactions (be=0xa58150, stmt=value
optimized out) at gnc-transaction-sql.c:385
#2  0x7fffe86d5919 in gnc_sql_transaction_load_all_tx
(be=0xa58150) at gnc-transaction-sql.c:768
#3  0x70464643 in g_hash_table_foreach () from /usr/lib/libglib-2.0.so.0
#4  0x77bc69e3 in qof_object_foreach_backend
(backend_name=value optimized out, cb=0x7fffe86cd220
initial_load_cb, user_data=value optimized out) at qofobject.c:373
#5  0x7fffe86cd1f1 in gnc_sql_load (be=0xa58150, book=0x9cfc80,
loadType=value optimized out) at gnc-backend-sql.c:193
#6  0x7fffe88e172b in gnc_dbi_load (qbe=0xa58150, book=0x9cfc80,
loadType=LOAD_TYPE_INITIAL_LOAD) at gnc-backend-dbi.c:800
#7  0x77bccf5c in qof_session_load (session=0xa34ec0,
percentage_func=0x765a2d10 gnc_window_show_progress) at
qofsession.c:1320
#8  0x7656dd57 in gnc_post_file_open (filename=value
optimized out) at gnc-file.c:758
#9  0x0040368c in inner_main (closure=value optimized out,
argc=value optimized out, argv=value optimized out) at
gnucash-bin.c:521
#10 0x701aa1cf in invoke_main_func () from /usr/lib/libguile.so.17
#11 0x7018056a in c_body () from /usr/lib/libguile.so.17
#12 0x701e6cbd in scm_c_catch () from /usr/lib/libguile.so.17
#13 0x70180a17 in scm_i_with_continuation_barrier () from
/usr/lib/libguile.so.17
#14 0x70180ab0 in scm_c_with_continuation_barrier () from
/usr/lib/libguile.so.17
#15 0x701e5c14 in scm_i_with_guile_and_parent () from
/usr/lib/libguile.so.17
#16 0x701aa185 in scm_boot_guile () from /usr/lib/libguile.so.17
#17 0x004032f6 in main (argc=1, argv=0x7fffe798) at
gnucash-bin.c:672

I got myself a copy of the glib source code. In the comments preceding
g_list_append, this appears:

 * Note that g_list_append() has to traverse the entire list
 * to find the end, which is inefficient when adding multiple
 * elements. A common idiom to avoid the inefficiency is to prepend
 * the elements and reverse the list when all elements have been added.

This is good advice that is familiar to any of us who have done a lot
of Lisp/Scheme work. So I changed query_transactions to call
g_list_prepend inside the transaction loop and reverse the list after
the loop complete. Now my data loads in about 30 seconds, whereas it
was 45 or so previously (my checking account register looks ok, so I
don't think I broke anything). So I repeated the process, periodically
interrupting data loading in gdb. Now I started to see things like
this:

(gdb) bt
#0  0x70471187 in g_list_append () from /usr/lib/libglib-2.0.so.0
#1  0x7fffe86d55c0 in load_splits_for_tx_list (be=0xa5a300,
stmt=value optimized out) at gnc-transaction-sql.c:238
#2  query_transactions (be=0xa5a300, stmt=value optimized out) at
gnc-transaction-sql.c:395
#3  0x7fffe86d5969 in gnc_sql_transaction_load_all_tx
(be=0xa5a300) at gnc-transaction-sql.c:769
#4  0x70464643 in g_hash_table_foreach () from /usr/lib/libglib-2.0.so.0
#5  0x77bc69e3 in qof_object_foreach_backend
(backend_name=value optimized out, cb=0x7fffe86cd270
initial_load_cb, user_data=value optimized out) at qofobject.c:373
#6  0x7fffe86cd241 in gnc_sql_load (be=0xa5a300, book=0x9d2480,
loadType=value optimized out) at gnc-backend-sql.c:193
#7  0x7fffe88e172b in gnc_dbi_load (qbe=0xa5a300, book=0x9d2480,
loadType=LOAD_TYPE_INITIAL_LOAD) at gnc-backend-dbi.c:800
#8  0x77bccf5c in qof_session_load (session=0xa36b80,
percentage_func=0x765a2d10 gnc_window_show_progress) at
qofsession.c:1320
#9  0x7656dd57 in gnc_post_file_open (filename=value
optimized out) at gnc-file.c:758
#10 0x0040368c in inner_main (closure=value optimized out,
argc=value optimized out, argv=value optimized out) at
gnucash-bin.c:521
#11 0x701aa1cf in invoke_main_func () from /usr/lib/libguile.so.17
#12 0x7018056a in c_body () from /usr/lib/libguile.so.17
#13 0x701e6cbd in scm_c_catch () from /usr/lib/libguile.so.17
#14 0x70180a17 in scm_i_with_continuation_barrier () from
/usr/lib/libguile.so.17
#15 0x70180ab0 in scm_c_with_continuation_barrier () from
/usr/lib/libguile.so.17
#16 0x701e5c14 in 

Re: SQL backend performance

2010-02-24 Thread Christian Stimming
 So I applied the same treatment to load_splits_for_tx_list,
 substituting g_list_prepend for g_list_append inside the
 split-fetching loop and reversing the list on completion of the loop.
 I rebuilt and tried again and now my data loads in about 9 seconds,
 approximately the same as the xml file and about a factor of 5
 improvement! I haven't tested it yet, but I believe we'll see a nice
 improvement with sqlite3 as well, since this code is not specific to
 either database.

This is extremely good news! Thanks a lot for looking into this and reminding 
anyone here to read the docs of the container types we're using... 

I guess the same (g_list_prepend vs. append) might show up in the XML backend 
as well...

As for the patch and indentation: Indeed we discussed and agreed on some 
common indentation last summer, and we agreed on using the tool astyle to 
reformat the code as decided. See 
http://lists.gnucash.org/pipermail/gnucash-devel/2009-August/026121.html and 
my commit r18675 recently. I didn't apply this to the full source tree so far 
in order not to destroy some people's diffs which are still waiting to be 
applied... I think the directory you're mentioning wasn't run through astyle 
recently, so that would explain this issue.

Regards,

Christian
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: SQL backend performance

2010-02-24 Thread Phil Longstaff
On Wed, 2010-02-24 at 14:37 -0500, Donald Allen wrote:

Great!  I'll apply the patch.  There are probable other places which
would benefit from this.  There might also be places where the order is
unimportant so that the list doesn't need to be reversed.

 BTW, I found the indentation/formatting of this (and other) files to
 be very odd, and it rendered the code pretty unreadable for me. What
 you see above is a replication of the changes I made to the original
 file. In order to do this work, though, I had to format the two
 procedures involved (in emacs with meta-ctrl-q) so I could read them
 (I could have cleaned them up some more to eliminate unnecessary
 pointy-brackets, but didn't bother). It was that code that I actually
 tested, so there could be a typo in the above.
 
 I attribute great importance to code readability and have some strong
 opinions about how C code should look. I saw something on
 gnucash-devel earlier that gave me the impression there is something
 afoot to clean up the code. Is that so? If yes, I'd like to put my
 $.02 into that pot.

Yes, there is a code formatting effort being made using a standard set
of options to a code formatter.  They don't match my ideal preferences,
but given N people, there will N different tastes.

Phil

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Code formatting Re: SQL backend performance

2010-02-24 Thread Phil Longstaff
On Wed, 2010-02-24 at 20:50 +0100, Christian Stimming wrote:
  So I applied the same treatment to load_splits_for_tx_list,
  substituting g_list_prepend for g_list_append inside the
  split-fetching loop and reversing the list on completion of the loop.
  I rebuilt and tried again and now my data loads in about 9 seconds,
  approximately the same as the xml file and about a factor of 5
  improvement! I haven't tested it yet, but I believe we'll see a nice
  improvement with sqlite3 as well, since this code is not specific to
  either database.
 
 This is extremely good news! Thanks a lot for looking into this and reminding 
 anyone here to read the docs of the container types we're using... 
 
 I guess the same (g_list_prepend vs. append) might show up in the XML backend 
 as well...
 
 As for the patch and indentation: Indeed we discussed and agreed on some 
 common indentation last summer, and we agreed on using the tool astyle to 
 reformat the code as decided. See 
 http://lists.gnucash.org/pipermail/gnucash-devel/2009-August/026121.html and 
 my commit r18675 recently. I didn't apply this to the full source tree so far 
 in order not to destroy some people's diffs which are still waiting to be 
 applied... I think the directory you're mentioning wasn't run through astyle 
 recently, so that would explain this issue.

I think the best way to handle this formatting is to create an options
file (perhaps src/c-code.style) which is then used everywhere:

astyle --options=~/gnucash/src/c-code.style *.[ch]

Christian, could you create such a central file with the options you are
using?

Does anyone know of a scheme formatter?

Phil

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: Code formatting Re: SQL backend performance

2010-02-24 Thread Phil Longstaff
On Wed, 2010-02-24 at 22:18 +0100, Geert Janssens wrote:
 On Wednesday 24 February 2010, Phil Longstaff wrote:
  On Wed, 2010-02-24 at 20:50 +0100, Christian Stimming wrote:
  Christian, could you create such a central file with the options you are
  using?
  
 Agreed on the idea of an options file, but it should IMHO not be in src. I'd 
 prefer to see it in util.

util sounds good.

Phil

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: SQL backend performance

2010-02-23 Thread Geert Janssens
On Tuesday 23 February 2010, Donald Allen wrote:
 As I've mentioned in other posts, I have a pretty large gnucash
 datafile -- more than 20 Mb uncompressed. I've been testing the SQL
 backend and I'm concerned about the performance, particularly startup
 performance.
 
 I've been doing this testing on an inexpensive little HP desktop
 machine, dual-core 2.8 Ghz AMD Athlon, 2 Gb memory, 300 Gb 7200 rpm
 SATA drive, Arch Linux, 2.6.32 kernel. It may have been cheap, but
 it's damned fast (I've been in the computer business since 1964 until
 I retired last Oct. and the cost-performance of today's hardware just
 boggles my mind, especially when I think about what we put up with 20
 or 30 years ago; you haven't lived until you've tried getting your
 work done *and* stayed sane while sitting in front of a VT100 talking
 to an overloaded Vax 780 running 4.1BSD; motto: It was hard to build,
 it ought to be hard to use). From my gnucash xml file, I've created
 sqlite3 and postgresql databases containing the same data.
 
 Here are the average data-load timings, in seconds, from just a couple
 of tests per storage method (this is from the point gnucash says
 reading file until it's up):
 
 xml   9.10
 sqlite3   45.41
 postgresql45.46
 
 My mental model, which may be wrong, of what the SQL backend is doing
 is that it is reading the entire database at startup into memory, as
 opposed to retrieving the data as-needed (which is the way I'd guess
 gnucash+database would be architected if it was being written from
 scratch; I'll guess further that given the file-based history of
 gnucash, it's existing architecture precluded using the incremental
 approach when adding the database backend). I'm guessing this because
 of the amount of time it takes to start up when pointed either at
 postgresql or sqlite3 versions of the same data. I am further assuming
 that, with the SQL backend, as the user adds new things (accounts,
 transactions, etc.), new objects get inserted into the database and
 modified objects get updated, on the spot. I'm guessing this because
 as I make changes, the 'save' button remains grayed-out. So the
 primary advantage of the database backend is that file saves are not
 necessary. But, at least in my case, I am paying about a 36 second
 price at startup to avoid the file saves. File saves on my machine
 take about 2 seconds, much faster than reading the file, probably
 because the data is being written to the buffer cache. So I'd need to
 do 18 file saves during a gnucash session to be worse off (kept
 waiting) with the xml file than with the data in either flavor of
 database. And that assumes that I am always waiting for file saves to
 complete, which is not always the case (I frequently do them after
 completing a chunk of work, e.g., entering an investment transaction
 from a statement, and the file-save happens while I turn my attention
 to the paperwork to figure out what I need to do next).
 
 While I didn't do a lot of timed tests and I've used the word guess
 an awful lot above (corrections to what I've said above from someone
 who actually knows what (s)he is talking about are perfectly welcome),
 the numbers confirm what I already knew from many more untimed tests,
 that at least in my case, startup takes an unacceptably long time. If
 this is the way the released version performs and my
 back-of-the-envelope analysis of the tradeoff is correct, I don't
 think I'd use the database stuff and just continue with the xml file.
 But if you think I've missed something here, please feel free.
 
Your assumptions on how things work are correct.

And I noticed this performance decrease as well.

There is one difference between the xml and the sql backends that may 
influence this (at least in part): the sql backend writes lots of debug 
information to gnucash.trace at present. I don't know how much impact this 
has, I haven't tested without debug information, but if we disable the debug 
information before the 2.4 release, it will surely narrow the gap.

In the future this may still be improved such that data is only queried for 
when needed, but this requires a lot of internal modifications. This would not 
have been possible for the next stable release.

Geert
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: SQL backend performance

2010-02-23 Thread Donald Allen
On Tue, Feb 23, 2010 at 9:15 AM, Geert Janssens
janssens-ge...@telenet.be wrote:
 On Tuesday 23 February 2010, Donald Allen wrote:
 As I've mentioned in other posts, I have a pretty large gnucash
 datafile -- more than 20 Mb uncompressed. I've been testing the SQL
 backend and I'm concerned about the performance, particularly startup
 performance.

 I've been doing this testing on an inexpensive little HP desktop
 machine, dual-core 2.8 Ghz AMD Athlon, 2 Gb memory, 300 Gb 7200 rpm
 SATA drive, Arch Linux, 2.6.32 kernel. It may have been cheap, but
 it's damned fast (I've been in the computer business since 1964 until
 I retired last Oct. and the cost-performance of today's hardware just
 boggles my mind, especially when I think about what we put up with 20
 or 30 years ago; you haven't lived until you've tried getting your
 work done *and* stayed sane while sitting in front of a VT100 talking
 to an overloaded Vax 780 running 4.1BSD; motto: It was hard to build,
 it ought to be hard to use). From my gnucash xml file, I've created
 sqlite3 and postgresql databases containing the same data.

 Here are the average data-load timings, in seconds, from just a couple
 of tests per storage method (this is from the point gnucash says
 reading file until it's up):

 xml   9.10
 sqlite3       45.41
 postgresql    45.46

 My mental model, which may be wrong, of what the SQL backend is doing
 is that it is reading the entire database at startup into memory, as
 opposed to retrieving the data as-needed (which is the way I'd guess
 gnucash+database would be architected if it was being written from
 scratch; I'll guess further that given the file-based history of
 gnucash, it's existing architecture precluded using the incremental
 approach when adding the database backend). I'm guessing this because
 of the amount of time it takes to start up when pointed either at
 postgresql or sqlite3 versions of the same data. I am further assuming
 that, with the SQL backend, as the user adds new things (accounts,
 transactions, etc.), new objects get inserted into the database and
 modified objects get updated, on the spot. I'm guessing this because
 as I make changes, the 'save' button remains grayed-out. So the
 primary advantage of the database backend is that file saves are not
 necessary. But, at least in my case, I am paying about a 36 second
 price at startup to avoid the file saves. File saves on my machine
 take about 2 seconds, much faster than reading the file, probably
 because the data is being written to the buffer cache. So I'd need to
 do 18 file saves during a gnucash session to be worse off (kept
 waiting) with the xml file than with the data in either flavor of
 database. And that assumes that I am always waiting for file saves to
 complete, which is not always the case (I frequently do them after
 completing a chunk of work, e.g., entering an investment transaction
 from a statement, and the file-save happens while I turn my attention
 to the paperwork to figure out what I need to do next).

 While I didn't do a lot of timed tests and I've used the word guess
 an awful lot above (corrections to what I've said above from someone
 who actually knows what (s)he is talking about are perfectly welcome),
 the numbers confirm what I already knew from many more untimed tests,
 that at least in my case, startup takes an unacceptably long time. If
 this is the way the released version performs and my
 back-of-the-envelope analysis of the tradeoff is correct, I don't
 think I'd use the database stuff and just continue with the xml file.
 But if you think I've missed something here, please feel free.

 Your assumptions on how things work are correct.

 And I noticed this performance decrease as well.

 There is one difference between the xml and the sql backends that may
 influence this (at least in part): the sql backend writes lots of debug
 information to gnucash.trace at present. I don't know how much impact this
 has, I haven't tested without debug information, but if we disable the debug
 information before the 2.4 release, it will surely narrow the gap.

I'm seeing trace files on the order of .5 Mb. As I mentioned earlier,
saving my xml file takes about 2 seconds. It's about 2.5 Mb (over 20
Mb uncompressed) and the 2 seconds includes the time to compress it.
Writing the trace file is not nearly as hard a job and the periodic
writes should be to the buffer cache on any reasonable machine. So
I'll guess (again) that the gap-narrowing won't amount to much. I hope
I'm wrong :-)


 In the future this may still be improved such that data is only queried for
 when needed, but this requires a lot of internal modifications. This would not
 have been possible for the next stable release.

I understand why this was done as it was; I said so in my previous
email. If I had been running this project, I'm quite sure I'd have
done the same thing, so I'm not at all critical. I also note that I'm
a bit of an extreme case. I've been 

Re: SQL backend performance

2010-02-23 Thread Geert Janssens
On Tuesday 23 February 2010, Donald Allen wrote:
 On Tue, Feb 23, 2010 at 9:15 AM, Geert Janssens
  Your assumptions on how things work are correct.
 
  And I noticed this performance decrease as well.
 
  There is one difference between the xml and the sql backends that may
  influence this (at least in part): the sql backend writes lots of debug
  information to gnucash.trace at present. I don't know how much impact
  this has, I haven't tested without debug information, but if we disable
  the debug information before the 2.4 release, it will surely narrow the
  gap.
 
 I'm seeing trace files on the order of .5 Mb. As I mentioned earlier,
 saving my xml file takes about 2 seconds. It's about 2.5 Mb (over 20
 Mb uncompressed) and the 2 seconds includes the time to compress it.
 Writing the trace file is not nearly as hard a job and the periodic
 writes should be to the buffer cache on any reasonable machine. So
 I'll guess (again) that the gap-narrowing won't amount to much. I hope
 I'm wrong :-)
 
I think true measurements will be the only way to find out what causes delays 
where. But it's clear there's still room for performance improvements.

Geert
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel