Re: Querying the journal via JSON using jq

Martin Blais Sun, 22 May 2016 11:10:18 -0700

On Fri, May 20, 2016 at 1:45 AM, <[email protected]> wrote:

> Hi,
>
> I'd like to share a way to query the journal in addition to bean-query.
> Using a simple python script, the beancount journal could be converted to
> JSON. Then one can use command line JSON tools, for example jq
> <https://stedolan.github.io/jq/>, to manipulate and query it.
>
> Here's the script: b2j.py
> <https://gist.github.com/xiaoruoruo/3fdf8d0f92b3fd85f259b35f888e4257>.
>
> For example:
>
> ./b2j.py test.beancount | jq '.[] | select (._type == "Event")'
> {
>   "meta": {
>     "lineno": 50
>   },
>   "date": "2013-01-01",
>   "type": "employer",
>   "description": "Ace Corp",
>   "_type": "Event"
> }
> {
>   "meta": {
>     "lineno": 80
>   },
>   "date": "2016-01-01",
>   "type": "employer",
>   "description": "Bob Corp",
>   "_type": "Event"
> }
>
> Another use case is to query for transactions with certain metadata, which
> currently bean-count doesn't support querying. It's probably very hard for
> jq to do balance reports, but it's quite good at filtering and outputting
> parts of the transactions.
>
> Hope this helps.
>

Thanks for sharing. The same could be said about using a good old regular
SQL database to do the same, using XML with some Xpath querying tool, or
even using a sequence of protocol buffers stored in a file. At the end of
the day, we're the result of the little bit processing Beancount does on
its input is just a simple stream of data, and there are many ways in which
you could represent it externally and use available tools on it.

There's already a converter from Beancount format to SQL which you can try
out:
https://bitbucket.org/blais/beancount/src/tip/src/python/beancount/scripts/sql.py

I've also dabbled with the idea of extending SQLite3 to provide the
contents of a Beancount input file as a data source directly (it turns out
it's difficult to add the other things I need to replace bean-query).

The problem is, there are a few legitimate reasons for a custom query tool
to exist, let me explain:

Much of what is interesting about the accounting operations we carry out is
the ability to accumulate positions in inventories. For example, you can
deposit 100 CAD in an account (e.g., "cash") which holds 200 USD. The
result is never 300 anything... it's 200 USD & 100 CAD. SImilarly, you
could accumulate 10 shares of AAPL acquired at 88.00 USD in an account that
already contains a number of other existing lots of AAPL. The result should
be the list of those lots.

Those types of operations are difficult to emulate using SQL or any other
generic tool. To replicate this, you'd have to _always_ imply an
aggregation clause by currency, i.e., a "GROUP BY currency". The cost
basis, acquisition dates, and labels complicate this further. It's very
inconvenient, I'm not even sure it can be done for accumulating balances.

Note on the other hand that for aggregating a list of holdings at a
particular point in time, these are isomorphic to Posting objects, you can
flatten those into a list of columns and apart from some subtleties it can
be useful to do arithmetic on that. Moving to tables for holdings make
sense because usually all that needs to be done is to deal with total cost
basis and market value of positions.

The way bean-query solves the query tool problem is by supporting the data
structures using in Beancount itself, namely, Amount, Position, and
Inventory. Inventory columns can accumulate Position columns, and you can
derive Amounts for the "units" or the "cost" of a Position or an Inventory.

Furthermore, the basic data structure of Beancount, the Transaction, always
has a two-level hierarchy: the transaction and its attributes, and its
child postings. You could represent this as an SQL JOIN, but it would be
the very same every time. Yes, you could create a view, but I think there's
a space to make it easier. To be clear, here's what I mean (in a simplified
way):

Transactions have date, flag, payee, narration
Postings have account, units (number and currency), cost (number, currency,
acquisition date, label), price (number and currency)

These can be represented as tables, with a unique "id" field for
Transactions and a non-unique "parent_id" reference in the Postings table.
You could join on this id, to obtain this:

date, flag, payee, narration, account, units_number, units_currency,
cost_number, ...

bean-query deals with that for you every time, and provides short-hands to
render journals and balances for example. It'll get even easier over time,
as I want to add some niceties like implicit group-by clauses (there are
very few uses for aggregation which differs from that which can be implied
from its targets, e.g., in "SELECT account, month(date), sum(posting)" it's
already obvious that 1, 2 is the grouping key).

What I want to do eventually is genericize bean-query to a more powerful
SQL processor that could
- Act on a multitude of data sources, one of which would the data from
Beancount. Others would include tables from HTML documents, attributes of
files in directories, and many others.
- Support complex custom data types defined in pure Python, like
Beancount's Amount, Position, and Inventory.
- Support repeated values generically, and use that on the Inventory type,
flattening those rows into multiple rows with a single Position in each
- Support hierarchical references.
- Support recursive SQL queries.
The ultimate purpose of this tool will be to reach beyond Beancount and
allow anyone to easily inject a command-line SQL processor onto many common
data sources, think of it as a swiss knife using SQL. This simple tool
would not be designed for scalability or high performance, but rather for
convenience. It could occupy some of the same space as Pandas does in the
Python world, and work on the command-line (SQL on the command-line is a
convenient way to express some expressions that would otherwise be
difficult using the battery of common UNIX tools).

Anyhow, I hope this offers some perspective on why there's bean-query in
the first place. I'd love to be able to do without, but I don't think it's
possible.

--
You received this message because you are subscribed to the Google Groups
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/beancount/CAK21%2BhNs%3DfO1FjWyyk-YAvecnkno_jYE6Hut3CUECSEjuPw2HA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Querying the journal via JSON using jq

Reply via email to