Re: Tips for speeding up Beancount?

Martin Blais Wed, 16 Aug 2017 22:07:18 -0700

On Wed, Aug 16, 2017 at 5:50 PM, Matthew Harris <[email protected]>
wrote:


> How big are your input files,
>

$ bean-report $L stats-entries
Type         Num Entries
-----------  -----------
Transaction        12259
Price               4511
Balance             3525
Document            1699
Open                 760
Event                308
Close                197
Commodity            128
Note                  78
Pad                   13
Query                  3
~Total~            23481
-----------  -----------

31677 postings.

Very similar scale to yours.



> and how long does Beancount take to parse them?
>

bergamot:~$ time bean-check $L

real    0m6.882s
user    0m6.784s
sys     0m0.096s
bergamot:~$ time bean-check $L

real    0m0.508s
user    0m0.468s
sys     0m0.036s

On one of those:
 https://www.intel.com/content/www/us/en/products/boards-kits/nuc/kits/
nuc6i5syh.html
with 32Gb of RAM running Linux.
(It's basically a souped-up laptop in a small box.)



> My input file is 2.5M, 55k lines, 19515 directives (38017 postings in
> 13347 transactions). Running bean-web without a pickle cache takes about 30
> seconds to display my data on a MacBook Air. It's gotten to the point that
> it's rather painful to update and reprocess my file.
>

I suspect most of that 30 secs is rendering time.
Try running this:

$ time bean-check -v $L
INFO    : Operation: 'beancount.parser.parser.parse_file'             Time:
               651 ms
INFO    : Operation: 'beancount.parser.parser'                        Time:
         651 ms
INFO    : Operation: 'beancount.ops.pad'                              Time:
          71 ms
INFO    : Operation: 'beancount.ops.documents'                        Time:
          86 ms
INFO    : Operation: 'beancount.plugins.ira_contribs'                 Time:
          26 ms
INFO    : Operation: 'beancount.plugins.implicit_prices'              Time:
         209 ms
INFO    : Operation: 'beancount.plugins.sellgains'                    Time:
          26 ms
INFO    : Operation: 'washsales.commissions'                          Time:
          31 ms
INFO    : Operation: 'beancount.plugins.check_commodity'              Time:
          35 ms
INFO    : Operation: 'beancount.ops.balance'                          Time:
        1819 ms
INFO    : Operation: 'function: validate_open_close'                  Time:
                 7 ms
INFO    : Operation: 'function: validate_active_accounts'             Time:
                46 ms
INFO    : Operation: 'function: validate_currency_constraints'        Time:
                25 ms
INFO    : Operation: 'function: validate_duplicate_balances'          Time:
                10 ms
INFO    : Operation: 'function: validate_duplicate_commodities'       Time:
                 5 ms
INFO    : Operation: 'function: validate_documents_paths'             Time:
                 5 ms
INFO    : Operation: 'function: validate_check_transaction_balances'  Time:
               257 ms
INFO    : Operation: 'function: validate_data_types'                  Time:
               107 ms
INFO    : Operation: 'beancount.ops.validate'                         Time:
         465 ms
INFO    : Operation: 'beancount.loader (total)'                       Time:
  6586 ms

real    0m6.781s
user    0m6.708s
sys     0m0.068s

I sort-of live with it (I mostly use the SQL commands now), but I'd be
lying if I said it doesn't annoy me.
Used to be snappy and fast, I think beyond 2secs it starts to annoy me.
I'm in a similar situation as you... getting annoyed, but not enough to
actually do anything about it yet.



I'm awfully tempted to split up my file and use "include"s, for a number of
> reasons, but I've resisted up to this point because
>
>    - You use and advocate a single file.
>    - I'm afraid that I could forget to "include" one of my files and
>    never notice.
>
>


>
>    - It looks like the pickle cache is only a single, root-level cache.
>    (Would it be possible to cache each of the included files separately, so
>    that when I split my file into n pieces and edit only one piece, I still
>    get the benefit of the cache for the other n-1 pieces?)
>
> Some of that would be possible, but it's not trivial.
I wanted to do this at some point, here are some notes I took at the time:
https://bitbucket.org/blais/beancount/src/c5dfae27c8b598c1267a741b972388
072618c1a7/TODO?at=default&fileviewer=file-view-default#TODO-2272

We could also do a run or two of profiling and take a couple of stabs at
that (not much has been done in that dept so far TBH).
Beyond a few big cuts that way, I think the most oft-called functions could
be translated to C and that would probably make it much faster.

I'll admit I'm more or less in maintenance mode for a little while.



> Another option, which I've seen suggested for Ledger in the past, is to
> "close out" each year. That makes it harder to look at the complete history
> of a single account though.
>

Yes, and in the past I've made a lot of noise on the Ledger list about the
fact that should be handled by the software...   Mainly because changing
data in past years would force you to regenerate all the close/open
transactions in all the files in all future years. I think that's not a
workable scenario.
But you're right, it would circumvent the speed issue.
I think I have some code somewhere to generate those split transactions too.

Hmm, I think if there was a super fast way (e.g. in the parser, in C) to
drop off transactions before a filtered date and replace them by that
equivalent open transaction (computed from a previous run) that could
potentially offer an on-the-fly version of this. Basically, can we build
this as a feature, without forcing the user to edit the input file, and
would it be worth it?



>
>
> Matthew
>
> --
> You received this message because you are subscribed to the Google Groups
> "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/ms
> gid/beancount/b4b4ce09-0877-4a8a-a609-6aa6d97e89c0%40googlegroups.com
> <https://groups.google.com/d/msgid/beancount/b4b4ce09-0877-4a8a-a609-6aa6d97e89c0%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/CAK21%2BhNgbbi4n7UoMTpR9JZgoX-%3DoMYUoi3E4v%3DW-vZM_YNHGQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Tips for speeding up Beancount?

Reply via email to