On Wed, Aug 16, 2017 at 5:50 PM, Matthew Harris <[email protected]> wrote:
> How big are your input files, > $ bean-report $L stats-entries Type Num Entries ----------- ----------- Transaction 12259 Price 4511 Balance 3525 Document 1699 Open 760 Event 308 Close 197 Commodity 128 Note 78 Pad 13 Query 3 ~Total~ 23481 ----------- ----------- 31677 postings. Very similar scale to yours. > and how long does Beancount take to parse them? > bergamot:~$ time bean-check $L real 0m6.882s user 0m6.784s sys 0m0.096s bergamot:~$ time bean-check $L real 0m0.508s user 0m0.468s sys 0m0.036s On one of those: https://www.intel.com/content/www/us/en/products/boards-kits/nuc/kits/ nuc6i5syh.html with 32Gb of RAM running Linux. (It's basically a souped-up laptop in a small box.) > My input file is 2.5M, 55k lines, 19515 directives (38017 postings in > 13347 transactions). Running bean-web without a pickle cache takes about 30 > seconds to display my data on a MacBook Air. It's gotten to the point that > it's rather painful to update and reprocess my file. > I suspect most of that 30 secs is rendering time. Try running this: $ time bean-check -v $L INFO : Operation: 'beancount.parser.parser.parse_file' Time: 651 ms INFO : Operation: 'beancount.parser.parser' Time: 651 ms INFO : Operation: 'beancount.ops.pad' Time: 71 ms INFO : Operation: 'beancount.ops.documents' Time: 86 ms INFO : Operation: 'beancount.plugins.ira_contribs' Time: 26 ms INFO : Operation: 'beancount.plugins.implicit_prices' Time: 209 ms INFO : Operation: 'beancount.plugins.sellgains' Time: 26 ms INFO : Operation: 'washsales.commissions' Time: 31 ms INFO : Operation: 'beancount.plugins.check_commodity' Time: 35 ms INFO : Operation: 'beancount.ops.balance' Time: 1819 ms INFO : Operation: 'function: validate_open_close' Time: 7 ms INFO : Operation: 'function: validate_active_accounts' Time: 46 ms INFO : Operation: 'function: validate_currency_constraints' Time: 25 ms INFO : Operation: 'function: validate_duplicate_balances' Time: 10 ms INFO : Operation: 'function: validate_duplicate_commodities' Time: 5 ms INFO : Operation: 'function: validate_documents_paths' Time: 5 ms INFO : Operation: 'function: validate_check_transaction_balances' Time: 257 ms INFO : Operation: 'function: validate_data_types' Time: 107 ms INFO : Operation: 'beancount.ops.validate' Time: 465 ms INFO : Operation: 'beancount.loader (total)' Time: 6586 ms real 0m6.781s user 0m6.708s sys 0m0.068s I sort-of live with it (I mostly use the SQL commands now), but I'd be lying if I said it doesn't annoy me. Used to be snappy and fast, I think beyond 2secs it starts to annoy me. I'm in a similar situation as you... getting annoyed, but not enough to actually do anything about it yet. I'm awfully tempted to split up my file and use "include"s, for a number of > reasons, but I've resisted up to this point because > > - You use and advocate a single file. > - I'm afraid that I could forget to "include" one of my files and > never notice. > > > > - It looks like the pickle cache is only a single, root-level cache. > (Would it be possible to cache each of the included files separately, so > that when I split my file into n pieces and edit only one piece, I still > get the benefit of the cache for the other n-1 pieces?) > > Some of that would be possible, but it's not trivial. I wanted to do this at some point, here are some notes I took at the time: https://bitbucket.org/blais/beancount/src/c5dfae27c8b598c1267a741b972388 072618c1a7/TODO?at=default&fileviewer=file-view-default#TODO-2272 We could also do a run or two of profiling and take a couple of stabs at that (not much has been done in that dept so far TBH). Beyond a few big cuts that way, I think the most oft-called functions could be translated to C and that would probably make it much faster. I'll admit I'm more or less in maintenance mode for a little while. > Another option, which I've seen suggested for Ledger in the past, is to > "close out" each year. That makes it harder to look at the complete history > of a single account though. > Yes, and in the past I've made a lot of noise on the Ledger list about the fact that should be handled by the software... Mainly because changing data in past years would force you to regenerate all the close/open transactions in all the files in all future years. I think that's not a workable scenario. But you're right, it would circumvent the speed issue. I think I have some code somewhere to generate those split transactions too. Hmm, I think if there was a super fast way (e.g. in the parser, in C) to drop off transactions before a filtered date and replace them by that equivalent open transaction (computed from a previous run) that could potentially offer an on-the-fly version of this. Basically, can we build this as a feature, without forcing the user to edit the input file, and would it be worth it? > > > Matthew > > -- > You received this message because you are subscribed to the Google Groups > "Beancount" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > To view this discussion on the web visit https://groups.google.com/d/ms > gid/beancount/b4b4ce09-0877-4a8a-a609-6aa6d97e89c0%40googlegroups.com > <https://groups.google.com/d/msgid/beancount/b4b4ce09-0877-4a8a-a609-6aa6d97e89c0%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Beancount" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAK21%2BhNgbbi4n7UoMTpR9JZgoX-%3DoMYUoi3E4v%3DW-vZM_YNHGQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
