Re: Querying the journal via JSON using jq

Martin Blais Sun, 22 May 2016 11:10:18 -0700

I forget: there's also a simple function to extract metadata from
bean-query now:


https://bitbucket.org/blais/beancount/src/tip/src/python/beancount/query/query_env.py?fileviewer=file-view-default#query_env.py-543



On Sun, May 22, 2016 at 2:06 PM, Martin Blais <[email protected]> wrote:

> On Fri, May 20, 2016 at 1:45 AM, <[email protected]> wrote:
>
>> Hi,
>>
>> I'd like to share a way to query the journal in addition to bean-query.
>> Using a simple python script, the beancount journal could be converted to
>> JSON. Then one can use command line JSON tools, for example jq
>> <https://stedolan.github.io/jq/>, to manipulate and query it.
>>
>> Here's the script: b2j.py
>> <https://gist.github.com/xiaoruoruo/3fdf8d0f92b3fd85f259b35f888e4257>.
>>
>> For example:
>>
>> ./b2j.py test.beancount | jq '.[] | select (._type == "Event")'
>> {
>>   "meta": {
>>     "lineno": 50
>>   },
>>   "date": "2013-01-01",
>>   "type": "employer",
>>   "description": "Ace Corp",
>>   "_type": "Event"
>> }
>> {
>>   "meta": {
>>     "lineno": 80
>>   },
>>   "date": "2016-01-01",
>>   "type": "employer",
>>   "description": "Bob Corp",
>>   "_type": "Event"
>> }
>>
>> Another use case is to query for transactions with certain metadata,
>> which currently bean-count doesn't support querying. It's probably very
>> hard for jq to do balance reports, but it's quite good at filtering and
>> outputting parts of the transactions.
>>
>> Hope this helps.
>>
>
> Thanks for sharing. The same could be said about using a good old regular
> SQL database to do the same, using XML with some Xpath querying tool, or
> even using a sequence of protocol buffers stored in a file. At the end of
> the day, we're the result of the little bit processing Beancount does on
> its input is just a simple stream of data, and there are many ways in which
> you could represent it externally and use available tools on it.
>
> There's already a converter from Beancount format to SQL which you can try
> out:
>
> https://bitbucket.org/blais/beancount/src/tip/src/python/beancount/scripts/sql.py
>
> I've also dabbled with the idea of extending SQLite3 to provide the
> contents of a Beancount input file as a data source directly (it turns out
> it's difficult to add the other things I need to replace bean-query).
>
> The problem is, there are a few legitimate reasons for a custom query tool
> to exist, let me explain:
>
> Much of what is interesting about the accounting operations we carry out
> is the ability to accumulate positions in inventories. For example, you can
> deposit 100 CAD in an account (e.g., "cash") which holds 200 USD. The
> result is never 300 anything... it's 200 USD & 100 CAD. SImilarly, you
> could accumulate 10 shares of AAPL acquired at 88.00 USD in an account that
> already contains a number of other existing lots of AAPL. The result should
> be the list of those lots.
>
> Those types of operations are difficult to emulate using SQL or any other
> generic tool. To replicate this, you'd have to _always_ imply an
> aggregation clause by currency, i.e., a "GROUP BY currency". The cost
> basis, acquisition dates, and labels complicate this further. It's very
> inconvenient, I'm not even sure it can be done for accumulating balances.
>
> Note on the other hand that for aggregating a list of holdings at a
> particular point in time, these are isomorphic to Posting objects, you can
> flatten those into a list of columns and apart from some subtleties it can
> be useful to do arithmetic on that. Moving to tables for holdings make
> sense because usually all that needs to be done is to deal with total cost
> basis and market value of positions.
>
> The way bean-query solves the query tool problem is by supporting the data
> structures using in Beancount itself, namely, Amount, Position, and
> Inventory. Inventory columns can accumulate Position columns, and you can
> derive Amounts for the "units" or the "cost" of a Position or an Inventory.
>
> Furthermore, the basic data structure of Beancount, the Transaction,
> always has a two-level hierarchy: the transaction and its attributes, and
> its child postings. You could represent this as an SQL JOIN, but it would
> be the very same every time. Yes, you could create a view, but I think
> there's a space to make it easier. To be clear, here's what I mean (in a
> simplified way):
>
> Transactions have date, flag, payee, narration
> Postings have account, units (number and currency), cost (number,
> currency, acquisition date, label), price (number and currency)
>
> These can be represented as tables, with a unique "id" field for
> Transactions and a non-unique "parent_id" reference in the Postings table.
> You could join on this id, to obtain this:
>
>   date, flag, payee, narration, account, units_number, units_currency,
> cost_number, ...
>
> bean-query deals with that for you every time, and provides short-hands to
> render journals and balances for example. It'll get even easier over time,
> as I want to add some niceties like implicit group-by clauses (there are
> very few uses for aggregation which differs from that which can be implied
> from its targets, e.g., in "SELECT account, month(date), sum(posting)" it's
> already obvious that 1, 2 is the grouping key).
>
> What I want to do eventually is genericize bean-query to a more powerful
> SQL processor that could
> - Act on a multitude of data sources, one of which would the data from
> Beancount. Others would include tables from HTML documents, attributes of
> files in directories, and many others.
> - Support complex custom data types defined in pure Python, like
> Beancount's Amount, Position, and Inventory.
> - Support repeated values generically, and use that on the Inventory type,
> flattening those rows into multiple rows with a single Position in each
> - Support hierarchical references.
> - Support recursive SQL queries.
> The ultimate purpose of this tool will be to reach beyond Beancount and
> allow anyone to easily inject a command-line SQL processor onto many common
> data sources, think of it as a swiss knife using SQL. This simple tool
> would not be designed for scalability or high performance, but rather for
> convenience. It could occupy some of the same space as Pandas does in the
> Python world, and work on the command-line (SQL on the command-line is a
> convenient way to express some expressions that would otherwise be
> difficult using the battery of common UNIX tools).
>
> Anyhow, I hope this offers some perspective on why there's bean-query in
> the first place. I'd love to be able to do without, but I don't think it's
> possible.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/CAK21%2BhNNLaEGnYcUhRXUCj%3DdLi6i-5HpZq232fsDJdLrZ3nC3g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Querying the journal via JSON using jq

Reply via email to