A word on Ledger structure

John Wiegley Tue, 13 Mar 2012 22:45:17 -0700

Ledger is developed as a tiered set of functionality, where lower tiers no
nothing about the higher tiers.  In fact, I build multiple libraries during
the process, and link unit tests to these libraries, so that it is a link
error for a lower tier to violate this modularity.


Those tiers are:

 - Utility code

   There's lots of general utility in Ledger for doing time parsing, using
   Boost.Regex, error handling, etc.  It's all done in a way that can be
   reused in other projects as needed.

 - Commoditized Amounts (amount_t, commodity_t and friends)

   An numerical abstraction combining multi-precision rational numbers (via
   GMP) with commodities.  These structures can be manipulated like regular
   numbers in either C++ or Python (as Amount objects).

 - Commodity Pool

   Commodities are all owned by a commodity pool, so that future parsing of
   amounts can link to the same commodity and established a consistent price
   history and record of formatting details.

 - Balances

   Adds the concept of multiple amounts with varying commodities.  Supports
   simple arithmetic, and multiplication and division with non-commoditized
   values.

 - Price history

   Amounts have prices, and these are kept in a data graph which the amount
   code itself is only dimly aware of (there's three points of access so an
   amount can query its revalued price on a given date).

 - Values

   Often the higher layers in Ledger don't care if something is an amount or a
   balance, they just want to add stuff to it or print it.  For this, I
   created a type-erasure class, value_t/Value, into which many things can be
   stuffed and then operated on.  They can contain amounts, balances, dates,
   strings, etc.  If you try to apply an operation between two values that
   makes no sense (like dividing an amount by a balance), an error occurs at
   runtime, rather than at compile-time (as would happen if you actually tried
   to divide an amount_t by a balance_t).

   This is the core data type for the value expression language.

 - Value expressions

   The next layer up adds functions and operators around the Value concept.
   This lets you apply transformations and tests to Values at runtime without
   having to bake it into C++.  The set of functions available is defined by
   each object type in Ledger (posts, accounts, transactions, etc.), though
   the core engine knows nothing about these.  At its base, it only knows how
   to apply operators to values, and how to pass them to and receive them from
   functions.

 - Query expressions

   Expressions can be onerous to type at the command-line, so there's a
   shorthand for reporting called "query expressions".  These add no
   functionality of there own, but are purely translated from the input string
   (cash) down to the corresponding value expression (account =~ /cash/).
   This is a convenience layer.

 - Format strings

   Format strings let you interpolate value expressions into string, with the
   requirement that any interpolated value have a string representation.
   Really all this does is calculate the value expression in the current
   report context, call the resulting value's "to_string()" method, and stuffs
   the result into the output string.  It also provides printf-like behavior,
   such as min/max width, right/left justification, etc.

 - Journal items

   Next is a base type shared by anything that can appear in a journal: an
   item_t.  It contains details common to all such parsed entities, like what
   file and line it was found on, etc.

 - Journal posts

   The most numerous object found in a Journal, postings are a type of item
   that contain an account, an amount, a cost, and metadata.  There are some
   other complications, like the account can be marked virtual, the amount
   could be an expression, etc.

 - Journal transactions

   Postings are owned by transactions, always.  This subclass of item_t knows
   about the date, the payee, etc.  If a date or metadata tag is requested
   from a posting and it doesn't have that information, the transaction is
   queried to see if it can provide it.

 - Journal accounts

   Postings are also shared by accounts, though the actual memory is managed
   by the transaction.  Each account knows all the postings within it, but
   contains relatively little information of its own.

 - The Journal object

   Finally, all transactions with their postings, and all accounts, are owned
   by a journal_t object.  This is the go-to object for querying ad reporting
   on your data.

 - Textual journal parser

   There is a textual parser, wholly contained in textual.cc, which knows how
   to parse text into journal objects, which then get "finalized" and added to
   the journal.  Finalization is the step that enforces the double-entry
   guarantee.

 - Iterators

   Every journal object is "iterable", and these iterators are defined in
   iterators.h and iterators.cc.  This iteration logic is kept out of the
   basic journal objects themselves for the sake of modularity.

 - Comparators

   Another abstraction isolated to its own layer, this class encapsulating the
   comparison of journal objects, based on whatever value expression the user
   passed to --sort.

 - Temporaries

   Many reports bring pseudo-journal objects into existence, like postings
   which report totals in a "<Total>" account.  These objects are created and
   managed by a temporaries_t object, which gets used in many places by the
   reporting filters.

 - Option handling

   There is an option handling subsystem used by many of the layers further
   down.  It makes it relatively easy for me to add new options, and to have
   those option settings immediately accessible to value expressions.

 - Session objects

   Every journal object is owned by a session, with the session providing
   support for that object.  In GUI terms, this is the Controller object for
   the journal Data object, where every document window would be a separate
   session.  They are all owned by the global scope.

 - Report objects

   Every time you create report output, a report object is created to
   determine what you want to see.  In the Ledger REPL, a new report object is
   created every time a command is executed.  In CLI mode, only one report
   object ever comes into being, as Ledger immediately exits after displaying
   the results.

 - Reporting filters

   The way Ledger generates data is this: it asks the session for the current
   journal, and then creates an iterator applied to that journal.  The kind of
   iterator depends on the type of report.

   This iterator is then walked, and every object yielded from the iterator is
   passed to an "item handler", whose type is directly related to the type of
   the iterator.

   There are many, many item handlers, which can be chained together.  Each
   one receives an item (post, account, xact, etc.), performs some action on
   it, and then passes it down to the next handler in the chain.  There are
   filters which compute the running totals; that queue and sort all the input
   items before playing them back out in a new order; that filter out items
   which fail to match a predicate, etc.  Almost every reporting feature in
   Ledger is related to one or more filters.  Looking at filters.h, I see over
   25 of them defined currently.

 - The filter chain

   How filters get wired up, and in what order, is a complex process based on
   all the various options specified by the user.  This is the job of the
   chain logic, found entirely in chain.cc.  It took a really long time to get
   this logic exactly write, which is why I haven't exposed this layer to the
   Python bridge yet.

 - Output modules

   Although filters are great and all, in the end you want to see stuff.  This
   is the job of special "leaf" filters call output modules.  They are
   implemented just like a regular filter, but they don't have a "next" filter
   to pass the time on down to.  Instead, they are the end of the line and
   must do something with the item that results in the user seeing something
   on their screen or in a file.

 - Select queries

   Select queries know a lot about everything, even though they implement
   their logic by implementing the user's query in terms of all the other
   features thus presented.  Select queries have no functionality of their
   own, they are simple a shorthand to provide access to much of Ledger's
   functionality via a cleaner, more consistent syntax.

 - The Global Scope

   There is a master object which owns every other objects, and this is
   Ledger's global scope.  It creates the other objects, provides REPL
   behavior for the command-line utility, etc.  In GUI terms, this is the
   Application object.

 - The Main Driver

   This creates the global scope object, performs error reporting, and handles
   command-line options which must precede even the creation of the global
   scope, such as --debug.

And that's Ledger in a nutshell.  All the rest are details, such as which
value expressions each journal item exposes, how many filters currently exist,
which options the report and session scopes define, etc.

John

A word on Ledger structure

Reply via email to