That was a great tour of Ledger's architecture, John, thanks for writing it up. It's also a nice guide for other implementors of Ledger-likes, and for documentors.

The strict testing of layering at link time is pretty neat. hledger's layering emerged as needed to avoid GHC "import cycle" errors. It's good to see the similarities between my layers and your (lower) layers. Our terminology has also become pretty consistent. Some time I should do a similar writeup following this format. Your post gives me some nice ideas and food for thought.

-Simon




On 3/13/12 10:45 PM, John Wiegley wrote:
Ledger is developed as a tiered set of functionality, where lower tiers no
nothing about the higher tiers.  In fact, I build multiple libraries during
the process, and link unit tests to these libraries, so that it is a link
error for a lower tier to violate this modularity.

Those tiers are:

  - Utility code

    There's lots of general utility in Ledger for doing time parsing, using
    Boost.Regex, error handling, etc.  It's all done in a way that can be
    reused in other projects as needed.

  - Commoditized Amounts (amount_t, commodity_t and friends)

    An numerical abstraction combining multi-precision rational numbers (via
    GMP) with commodities.  These structures can be manipulated like regular
    numbers in either C++ or Python (as Amount objects).

  - Commodity Pool

    Commodities are all owned by a commodity pool, so that future parsing of
    amounts can link to the same commodity and established a consistent price
    history and record of formatting details.

  - Balances

    Adds the concept of multiple amounts with varying commodities.  Supports
    simple arithmetic, and multiplication and division with non-commoditized
    values.

  - Price history

    Amounts have prices, and these are kept in a data graph which the amount
    code itself is only dimly aware of (there's three points of access so an
    amount can query its revalued price on a given date).

  - Values

    Often the higher layers in Ledger don't care if something is an amount or a
    balance, they just want to add stuff to it or print it.  For this, I
    created a type-erasure class, value_t/Value, into which many things can be
    stuffed and then operated on.  They can contain amounts, balances, dates,
    strings, etc.  If you try to apply an operation between two values that
    makes no sense (like dividing an amount by a balance), an error occurs at
    runtime, rather than at compile-time (as would happen if you actually tried
    to divide an amount_t by a balance_t).

    This is the core data type for the value expression language.

  - Value expressions

    The next layer up adds functions and operators around the Value concept.
    This lets you apply transformations and tests to Values at runtime without
    having to bake it into C++.  The set of functions available is defined by
    each object type in Ledger (posts, accounts, transactions, etc.), though
    the core engine knows nothing about these.  At its base, it only knows how
    to apply operators to values, and how to pass them to and receive them from
    functions.

  - Query expressions

    Expressions can be onerous to type at the command-line, so there's a
    shorthand for reporting called "query expressions".  These add no
    functionality of there own, but are purely translated from the input string
    (cash) down to the corresponding value expression (account =~ /cash/).
    This is a convenience layer.

  - Format strings

    Format strings let you interpolate value expressions into string, with the
    requirement that any interpolated value have a string representation.
    Really all this does is calculate the value expression in the current
    report context, call the resulting value's "to_string()" method, and stuffs
    the result into the output string.  It also provides printf-like behavior,
    such as min/max width, right/left justification, etc.

  - Journal items

    Next is a base type shared by anything that can appear in a journal: an
    item_t.  It contains details common to all such parsed entities, like what
    file and line it was found on, etc.

  - Journal posts

    The most numerous object found in a Journal, postings are a type of item
    that contain an account, an amount, a cost, and metadata.  There are some
    other complications, like the account can be marked virtual, the amount
    could be an expression, etc.

  - Journal transactions

    Postings are owned by transactions, always.  This subclass of item_t knows
    about the date, the payee, etc.  If a date or metadata tag is requested
    from a posting and it doesn't have that information, the transaction is
    queried to see if it can provide it.

  - Journal accounts

    Postings are also shared by accounts, though the actual memory is managed
    by the transaction.  Each account knows all the postings within it, but
    contains relatively little information of its own.

  - The Journal object

    Finally, all transactions with their postings, and all accounts, are owned
    by a journal_t object.  This is the go-to object for querying ad reporting
    on your data.

  - Textual journal parser

    There is a textual parser, wholly contained in textual.cc, which knows how
    to parse text into journal objects, which then get "finalized" and added to
    the journal.  Finalization is the step that enforces the double-entry
    guarantee.

  - Iterators

    Every journal object is "iterable", and these iterators are defined in
    iterators.h and iterators.cc.  This iteration logic is kept out of the
    basic journal objects themselves for the sake of modularity.

  - Comparators

    Another abstraction isolated to its own layer, this class encapsulating the
    comparison of journal objects, based on whatever value expression the user
    passed to --sort.

  - Temporaries

    Many reports bring pseudo-journal objects into existence, like postings
    which report totals in a "<Total>" account.  These objects are created and
    managed by a temporaries_t object, which gets used in many places by the
    reporting filters.

  - Option handling

    There is an option handling subsystem used by many of the layers further
    down.  It makes it relatively easy for me to add new options, and to have
    those option settings immediately accessible to value expressions.

  - Session objects

    Every journal object is owned by a session, with the session providing
    support for that object.  In GUI terms, this is the Controller object for
    the journal Data object, where every document window would be a separate
    session.  They are all owned by the global scope.

  - Report objects

    Every time you create report output, a report object is created to
    determine what you want to see.  In the Ledger REPL, a new report object is
    created every time a command is executed.  In CLI mode, only one report
    object ever comes into being, as Ledger immediately exits after displaying
    the results.

  - Reporting filters

    The way Ledger generates data is this: it asks the session for the current
    journal, and then creates an iterator applied to that journal.  The kind of
    iterator depends on the type of report.

    This iterator is then walked, and every object yielded from the iterator is
    passed to an "item handler", whose type is directly related to the type of
    the iterator.

    There are many, many item handlers, which can be chained together.  Each
    one receives an item (post, account, xact, etc.), performs some action on
    it, and then passes it down to the next handler in the chain.  There are
    filters which compute the running totals; that queue and sort all the input
    items before playing them back out in a new order; that filter out items
    which fail to match a predicate, etc.  Almost every reporting feature in
    Ledger is related to one or more filters.  Looking at filters.h, I see over
    25 of them defined currently.

  - The filter chain

    How filters get wired up, and in what order, is a complex process based on
    all the various options specified by the user.  This is the job of the
    chain logic, found entirely in chain.cc.  It took a really long time to get
    this logic exactly write, which is why I haven't exposed this layer to the
    Python bridge yet.

  - Output modules

    Although filters are great and all, in the end you want to see stuff.  This
    is the job of special "leaf" filters call output modules.  They are
    implemented just like a regular filter, but they don't have a "next" filter
    to pass the time on down to.  Instead, they are the end of the line and
    must do something with the item that results in the user seeing something
    on their screen or in a file.

  - Select queries

    Select queries know a lot about everything, even though they implement
    their logic by implementing the user's query in terms of all the other
    features thus presented.  Select queries have no functionality of their
    own, they are simple a shorthand to provide access to much of Ledger's
    functionality via a cleaner, more consistent syntax.

  - The Global Scope

    There is a master object which owns every other objects, and this is
    Ledger's global scope.  It creates the other objects, provides REPL
    behavior for the command-line utility, etc.  In GUI terms, this is the
    Application object.

  - The Main Driver

    This creates the global scope object, performs error reporting, and handles
    command-line options which must precede even the creation of the global
    scope, such as --debug.

And that's Ledger in a nutshell.  All the rest are details, such as which
value expressions each journal item exposes, how many filters currently exist,
which options the report and session scopes define, etc.

John



Reply via email to