Ledger is developed as a tiered set of functionality, where lower tiers no nothing about the higher tiers. In fact, I build multiple libraries during the process, and link unit tests to these libraries, so that it is a link error for a lower tier to violate this modularity.
Those tiers are: - Utility code There's lots of general utility in Ledger for doing time parsing, using Boost.Regex, error handling, etc. It's all done in a way that can be reused in other projects as needed. - Commoditized Amounts (amount_t, commodity_t and friends) An numerical abstraction combining multi-precision rational numbers (via GMP) with commodities. These structures can be manipulated like regular numbers in either C++ or Python (as Amount objects). - Commodity Pool Commodities are all owned by a commodity pool, so that future parsing of amounts can link to the same commodity and established a consistent price history and record of formatting details. - Balances Adds the concept of multiple amounts with varying commodities. Supports simple arithmetic, and multiplication and division with non-commoditized values. - Price history Amounts have prices, and these are kept in a data graph which the amount code itself is only dimly aware of (there's three points of access so an amount can query its revalued price on a given date). - Values Often the higher layers in Ledger don't care if something is an amount or a balance, they just want to add stuff to it or print it. For this, I created a type-erasure class, value_t/Value, into which many things can be stuffed and then operated on. They can contain amounts, balances, dates, strings, etc. If you try to apply an operation between two values that makes no sense (like dividing an amount by a balance), an error occurs at runtime, rather than at compile-time (as would happen if you actually tried to divide an amount_t by a balance_t). This is the core data type for the value expression language. - Value expressions The next layer up adds functions and operators around the Value concept. This lets you apply transformations and tests to Values at runtime without having to bake it into C++. The set of functions available is defined by each object type in Ledger (posts, accounts, transactions, etc.), though the core engine knows nothing about these. At its base, it only knows how to apply operators to values, and how to pass them to and receive them from functions. - Query expressions Expressions can be onerous to type at the command-line, so there's a shorthand for reporting called "query expressions". These add no functionality of there own, but are purely translated from the input string (cash) down to the corresponding value expression (account =~ /cash/). This is a convenience layer. - Format strings Format strings let you interpolate value expressions into string, with the requirement that any interpolated value have a string representation. Really all this does is calculate the value expression in the current report context, call the resulting value's "to_string()" method, and stuffs the result into the output string. It also provides printf-like behavior, such as min/max width, right/left justification, etc. - Journal items Next is a base type shared by anything that can appear in a journal: an item_t. It contains details common to all such parsed entities, like what file and line it was found on, etc. - Journal posts The most numerous object found in a Journal, postings are a type of item that contain an account, an amount, a cost, and metadata. There are some other complications, like the account can be marked virtual, the amount could be an expression, etc. - Journal transactions Postings are owned by transactions, always. This subclass of item_t knows about the date, the payee, etc. If a date or metadata tag is requested from a posting and it doesn't have that information, the transaction is queried to see if it can provide it. - Journal accounts Postings are also shared by accounts, though the actual memory is managed by the transaction. Each account knows all the postings within it, but contains relatively little information of its own. - The Journal object Finally, all transactions with their postings, and all accounts, are owned by a journal_t object. This is the go-to object for querying ad reporting on your data. - Textual journal parser There is a textual parser, wholly contained in textual.cc, which knows how to parse text into journal objects, which then get "finalized" and added to the journal. Finalization is the step that enforces the double-entry guarantee. - Iterators Every journal object is "iterable", and these iterators are defined in iterators.h and iterators.cc. This iteration logic is kept out of the basic journal objects themselves for the sake of modularity. - Comparators Another abstraction isolated to its own layer, this class encapsulating the comparison of journal objects, based on whatever value expression the user passed to --sort. - Temporaries Many reports bring pseudo-journal objects into existence, like postings which report totals in a "<Total>" account. These objects are created and managed by a temporaries_t object, which gets used in many places by the reporting filters. - Option handling There is an option handling subsystem used by many of the layers further down. It makes it relatively easy for me to add new options, and to have those option settings immediately accessible to value expressions. - Session objects Every journal object is owned by a session, with the session providing support for that object. In GUI terms, this is the Controller object for the journal Data object, where every document window would be a separate session. They are all owned by the global scope. - Report objects Every time you create report output, a report object is created to determine what you want to see. In the Ledger REPL, a new report object is created every time a command is executed. In CLI mode, only one report object ever comes into being, as Ledger immediately exits after displaying the results. - Reporting filters The way Ledger generates data is this: it asks the session for the current journal, and then creates an iterator applied to that journal. The kind of iterator depends on the type of report. This iterator is then walked, and every object yielded from the iterator is passed to an "item handler", whose type is directly related to the type of the iterator. There are many, many item handlers, which can be chained together. Each one receives an item (post, account, xact, etc.), performs some action on it, and then passes it down to the next handler in the chain. There are filters which compute the running totals; that queue and sort all the input items before playing them back out in a new order; that filter out items which fail to match a predicate, etc. Almost every reporting feature in Ledger is related to one or more filters. Looking at filters.h, I see over 25 of them defined currently. - The filter chain How filters get wired up, and in what order, is a complex process based on all the various options specified by the user. This is the job of the chain logic, found entirely in chain.cc. It took a really long time to get this logic exactly write, which is why I haven't exposed this layer to the Python bridge yet. - Output modules Although filters are great and all, in the end you want to see stuff. This is the job of special "leaf" filters call output modules. They are implemented just like a regular filter, but they don't have a "next" filter to pass the time on down to. Instead, they are the end of the line and must do something with the item that results in the user seeing something on their screen or in a file. - Select queries Select queries know a lot about everything, even though they implement their logic by implementing the user's query in terms of all the other features thus presented. Select queries have no functionality of their own, they are simple a shorthand to provide access to much of Ledger's functionality via a cleaner, more consistent syntax. - The Global Scope There is a master object which owns every other objects, and this is Ledger's global scope. It creates the other objects, provides REPL behavior for the command-line utility, etc. In GUI terms, this is the Application object. - The Main Driver This creates the global scope object, performs error reporting, and handles command-line options which must precede even the creation of the global scope, such as --debug. And that's Ledger in a nutshell. All the rest are details, such as which value expressions each journal item exposes, how many filters currently exist, which options the report and session scopes define, etc. John