Re: Beancount with large journals

Martin Blais Mon, 18 Feb 2019 13:46:13 -0800

On Mon, Feb 18, 2019 at 3:35 PM Shreedhar Hardikar <
[email protected]> wrote:


> Will the rewrite in C++ really help speed that much?
>

It should be well below 1s, a feeling of "instantaneous" is what I'm after.
In any case, a quick prototype would be written, to assess how long it
takes.


I mean, C++ does comes with a number of additional costs, and so do you
> believe ultimately that the benefit of C++ (execution speed) for an
> accounting tool like beancount, really outweighs those costs?
>

5-10secs of processing is just too much at the moment. Using the cache or
web interface offers a good workaround, but ultimately, even with the
cache, I find myself annoyed at how long it takes to process when I just
edit + save my file.

I'm with you about the hassle of C++ maintenance, but I want something
stable and for the ages, and simple.


Here's some of my thoughts:
>
>    1. C++ cross-platform dependency management & build - I personally use
>    beancount on a FreeBSD system, and I do have to manually build it (even
>    when install from pip) because there are some C/C++ library dependences for
>    the parser etc. I can say that part is not very fun. If then entire thing
>    is written in C++, care would have to be taken to not use "fancy" C++
>    features because that means not being able to use on certain systems
>    (because they have older compilers or don't have the specific). Perhaps
>    bazel solves that?
>
> I absolutely share your sentiment!
I would avoid unnecessary dependencies as much as possible, and my personal
C++ is closer to C (I generally avoid object-oriented programming and too
much overloading, and I use a small number of constructs very selectively).
The same way that my Python looks a bit "flat" - I'm using namedtuples
everywhere - the C++ in which this would be written would be very
conservative. While I might enjoy learning and fiddling with bleeding edge
features of C++, I recognize that it's a bit of an adolescent exercise in
bravado, and I'm very sensitive to the complexity they add so in that
codebase I'd avoid those, specifically for portability and ease of
long-term maintenance.

This is my main worry, and a central question: how much support should the
project offer for that? e.g. Should I have to support somebody coming with
a question about a compiler from 4 years ago on a platform I've never used?
(e.g. Arch) What about Windows support? Packaging?  I don't really have the
time. The parameters would be fairly narrow (Debian/Ubuntu, recent
compilers). On the other hand, I'd be shooting for simple dependencies
which have had a lot of testing (mostly open source google tech -- ABSL is
designed for the long-term).

I'm not sure yet. That's why I'd like to build a prototype with the more
recently appeared tools and share it for people to try.



>    1. Ease of development & hacking on the code - One prime reason I
>    chose beancount over ledger was the fact that the dta structures and
>    algorithms used were written in Python and so easier to grok. I am fairly
>    adept in C++, but running through .h & .cpp & Make & inheritance
>    hierarchies is much more work in C++ than other languages. It was difficult
>    for me to follow along the datatypes available in ledger and how the python
>    integration really worked. I mean, perhaps some more documentation would
>    have helped. Also C++ bugs may give segfaults a lot more often than python
>    code does - a different beast than the stack trace bugs in python. I'm not
>    saying it's not possible to write seg-fault-free code. It gets harder very
>    fast as the complexity goes up.
>
> Absolutely. I would write C++ code that is mostly free of classes (not
object-oriented), and would use exactly the same schema as I do now --
number, amount, position, posting, inventory. "Naked" data structures where
everything is public and if not immutable, in practice, used as if it were.
I'd probably use Protocol Buffers to define and represent those in memory,
along with a library of (stateless) functions to replace beancount/core. I
would basically just mirror the schema that I'm using now (I think it does
the job well) but in protos and C++ functions. It wouldn't be a redesign,
mainly a rewrite, fixing some things along the way (e..g
tolerance/precision). Anyone already familiar with the
beancount.core.data/position/inventory would immediately feel at home.
Moreover, the support for Python would be first-class: All the unit tests
would remain in Python, and I really care about being able to quickly put
things together in a quick Python script myself, so I would be guaranteed
for this to work well. (I have yet to experiment with CLIF, so that's
something I still have to assess.)


>
>    1. Also, I'm not sure of what design you have in mind, but if you are
>    going to expose Python bindings for plugins (which, according to the docs
>    is a fundamental part of beancount extensions model), won't you need to be
>    constantly converting between Python objects & C++ objects anyway? That
>    might nullify down all the benefits from C++. Caveat here: I'm not very
>    familiar with Python/C++ bindings, there may be a way to do this
>    efficiently. And maybe googe/clif solves that problem superbly.
>
> Good point. Something to keep in mind indeed. I've seen cases where
crossing the language barrier (e.g., between Go and C++) would be done by
serializing and deserializing entire messages on the other side, which is
(relatively) slow. (Go maintains its own copy of the representation in its
runtime, which offers advantages.) If I recall, protobufs have two targets
for Python bindings: one that is purely Python using some generic C library
calls, and a "protoc" one that manipulates the C objects directly (I'm not
sure, I need to dig in the details). The latter would be cheap to send
across Python/C. That's something I'd test for sure when writing a
prototype (ideally a cheap cross-language barrier - passing a pointer -
would be ideal).  This is one of the reasons why I think the core data
structures, parser, booking code, ops and the main plugins would be C++.
Plugins would probably have to be written in C++ (though the API would be
very simple, as it is now), but if possibly a Python API for them would
also be there (it might just slow down your processing a bit). Some of the
functionality that's currently there as plugins might also be required by
default (e.g. requiring commodities to be declared, implicit prices) so
maybe less plugins and options where it makes sense.



> Finally, I reckon that you can get a lot from your execution speeds by
> using other compiled language. Have you considered Go? It should give much
> faster execution speeds of integers/decimals with easier development,
> maintenance (and package management) etc. Caveat here: I have not used Go
> very much, that is, I know only basics, and what I've heard from others. It
> may work really well to solve the problem beancount is facing in an elegant
> manner.
>

I have a lot to say about Go -- I've led a team for a few years where we
implemented a project entirely in Go from scratch -- I know it very well. I
don't really want to get in the details (no time or place here), but Go is
not my favorite choice for this project and I won't be implementing it in
Go. One of the good things about this redesign idea is that this new base
(the "first third") of Beancount would output a stream of protocol buffer
objects. These could be parsed and processed in any language (Go included).

Ultimately, my goal is to have to maintain only about 1/3rd of the current
codebase so I have enough cycles to improve the core features and focus on
that. The interface/web activity already migrated to Fava, and the hope is
that a generalized query framework that operates on any type of data might
take wings of its own.


Anyway, I do hope you take these points in good spirit - as they were well
> intentioned. Beancount is a great product and I can't wait till it gets
> even better with all the features you listed out here!
>

Absolutely!




>
> Thanks,
> Shreedhar
>
> On Mon, Feb 18, 2019 at 12:22 PM Martin Blais <[email protected]> wrote:
>
>> On Thu, Feb 14, 2019 at 2:44 AM Stefano Zacchiroli <[email protected]>
>> wrote:
>>
>>> On Sun, Feb 10, 2019 at 11:07:03PM -0500, Martin Blais wrote:
>>> > You can view the breakdown in time with the -v option to bean-check:
>>>
>>> You've probably already thought about that, so out of curiosity: how
>>> much of this is potentially parallelizable, as an avenue for "easily"
>>> getting a performance boost? I guess not much, due to either I/O
>>> constraints or the GIL lock, right? I'm curious about whether
>>> validation, booking, and plugins might be made parallelizable in the
>>> future.
>>>
>>
>> None.
>> It's a sequential process.
>> Something that /might/ have an impact is to sequence all the operations
>> as a chain of streams consuming each other (think: generators/iterators),
>> for memory locality, but at this (small) scale I doubt it would make any
>> difference TBH. Some of the plugins do multiple passes over the stream,
>> which makes this not work and would require pirouettes to harvest
>> opportunities for reusing already computed quantities (e.g. results of
>> stuff from getters.py)
>>
>> No, I think what should be done for the next major release is a rewrite.
>> At the very coarse level, it looks like this in my mind:
>> - Beancount reports/web gets deleted in favor of Fava.
>> - Beancount query/SQL gets forked to a separate project operating on
>> arbitrary schemas (via protobufs as common representation for various
>> sources of data) and has support for Beancount integration (e.g. a Decimal
>> type, and simple aggregators with the semantics of
>> beancount.core.Inventory/Position/Amount). That's all that's needed, and it
>> would enable the query language to work on CSV files and other data
>> sources. Moreover, this version would be tested property, and have data
>> types in its compiler (no exceptions at runtime).
>> - Beancount core, parser, booking and plugins get rewritten in simple C++
>> (no boost/templates, but rather on top of a bazel + absl + protobuf + clif
>> base with functional-style and a straightforward subset of C++, no
>> classes), providing its parsed and booked contents as a stream of protobuf
>> objects.
>> - All tests would remain in Python (I'm not rewriting those).
>> Comprehensive clean Python bindings for beancount.core would be provided,
>> to do as much scripting as is done today, except with types implemented
>> fully in C++.
>> - Moreover, all the big ticket items would have to be addressed, e.g.
>> explicitly setting the precision instead of inference, currency trading
>> accounts, reports of trades built-in, etc.
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Beancount" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/beancount/CAK21%2BhMXqd9sOAey%2B3aFDi6gh22B5bG8Y08E7CKa5WssWcryZg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/beancount/CAK21%2BhMXqd9sOAey%2B3aFDi6gh22B5bG8Y08E7CKa5WssWcryZg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/beancount/CAAY9sD8%2BXEKOEstkmF5mHNMTWsGOjKJcFarBV15v%2BUCA7pAmYw%40mail.gmail.com
> <https://groups.google.com/d/msgid/beancount/CAAY9sD8%2BXEKOEstkmF5mHNMTWsGOjKJcFarBV15v%2BUCA7pAmYw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/CAK21%2BhNvC0amLQQ4U0hz624tz%2BBeJq-18B22c%2BM2_d%2B%2Be1DXXg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Beancount with large journals

Reply via email to