I forget: there's also a simple function to extract metadata from bean-query now:
https://bitbucket.org/blais/beancount/src/tip/src/python/beancount/query/query_env.py?fileviewer=file-view-default#query_env.py-543 On Sun, May 22, 2016 at 2:06 PM, Martin Blais <[email protected]> wrote: > On Fri, May 20, 2016 at 1:45 AM, <[email protected]> wrote: > >> Hi, >> >> I'd like to share a way to query the journal in addition to bean-query. >> Using a simple python script, the beancount journal could be converted to >> JSON. Then one can use command line JSON tools, for example jq >> <https://stedolan.github.io/jq/>, to manipulate and query it. >> >> Here's the script: b2j.py >> <https://gist.github.com/xiaoruoruo/3fdf8d0f92b3fd85f259b35f888e4257>. >> >> For example: >> >> ./b2j.py test.beancount | jq '.[] | select (._type == "Event")' >> { >> "meta": { >> "lineno": 50 >> }, >> "date": "2013-01-01", >> "type": "employer", >> "description": "Ace Corp", >> "_type": "Event" >> } >> { >> "meta": { >> "lineno": 80 >> }, >> "date": "2016-01-01", >> "type": "employer", >> "description": "Bob Corp", >> "_type": "Event" >> } >> >> Another use case is to query for transactions with certain metadata, >> which currently bean-count doesn't support querying. It's probably very >> hard for jq to do balance reports, but it's quite good at filtering and >> outputting parts of the transactions. >> >> Hope this helps. >> > > Thanks for sharing. The same could be said about using a good old regular > SQL database to do the same, using XML with some Xpath querying tool, or > even using a sequence of protocol buffers stored in a file. At the end of > the day, we're the result of the little bit processing Beancount does on > its input is just a simple stream of data, and there are many ways in which > you could represent it externally and use available tools on it. > > There's already a converter from Beancount format to SQL which you can try > out: > > https://bitbucket.org/blais/beancount/src/tip/src/python/beancount/scripts/sql.py > > I've also dabbled with the idea of extending SQLite3 to provide the > contents of a Beancount input file as a data source directly (it turns out > it's difficult to add the other things I need to replace bean-query). > > The problem is, there are a few legitimate reasons for a custom query tool > to exist, let me explain: > > Much of what is interesting about the accounting operations we carry out > is the ability to accumulate positions in inventories. For example, you can > deposit 100 CAD in an account (e.g., "cash") which holds 200 USD. The > result is never 300 anything... it's 200 USD & 100 CAD. SImilarly, you > could accumulate 10 shares of AAPL acquired at 88.00 USD in an account that > already contains a number of other existing lots of AAPL. The result should > be the list of those lots. > > Those types of operations are difficult to emulate using SQL or any other > generic tool. To replicate this, you'd have to _always_ imply an > aggregation clause by currency, i.e., a "GROUP BY currency". The cost > basis, acquisition dates, and labels complicate this further. It's very > inconvenient, I'm not even sure it can be done for accumulating balances. > > Note on the other hand that for aggregating a list of holdings at a > particular point in time, these are isomorphic to Posting objects, you can > flatten those into a list of columns and apart from some subtleties it can > be useful to do arithmetic on that. Moving to tables for holdings make > sense because usually all that needs to be done is to deal with total cost > basis and market value of positions. > > The way bean-query solves the query tool problem is by supporting the data > structures using in Beancount itself, namely, Amount, Position, and > Inventory. Inventory columns can accumulate Position columns, and you can > derive Amounts for the "units" or the "cost" of a Position or an Inventory. > > Furthermore, the basic data structure of Beancount, the Transaction, > always has a two-level hierarchy: the transaction and its attributes, and > its child postings. You could represent this as an SQL JOIN, but it would > be the very same every time. Yes, you could create a view, but I think > there's a space to make it easier. To be clear, here's what I mean (in a > simplified way): > > Transactions have date, flag, payee, narration > Postings have account, units (number and currency), cost (number, > currency, acquisition date, label), price (number and currency) > > These can be represented as tables, with a unique "id" field for > Transactions and a non-unique "parent_id" reference in the Postings table. > You could join on this id, to obtain this: > > date, flag, payee, narration, account, units_number, units_currency, > cost_number, ... > > bean-query deals with that for you every time, and provides short-hands to > render journals and balances for example. It'll get even easier over time, > as I want to add some niceties like implicit group-by clauses (there are > very few uses for aggregation which differs from that which can be implied > from its targets, e.g., in "SELECT account, month(date), sum(posting)" it's > already obvious that 1, 2 is the grouping key). > > What I want to do eventually is genericize bean-query to a more powerful > SQL processor that could > - Act on a multitude of data sources, one of which would the data from > Beancount. Others would include tables from HTML documents, attributes of > files in directories, and many others. > - Support complex custom data types defined in pure Python, like > Beancount's Amount, Position, and Inventory. > - Support repeated values generically, and use that on the Inventory type, > flattening those rows into multiple rows with a single Position in each > - Support hierarchical references. > - Support recursive SQL queries. > The ultimate purpose of this tool will be to reach beyond Beancount and > allow anyone to easily inject a command-line SQL processor onto many common > data sources, think of it as a swiss knife using SQL. This simple tool > would not be designed for scalability or high performance, but rather for > convenience. It could occupy some of the same space as Pandas does in the > Python world, and work on the command-line (SQL on the command-line is a > convenient way to express some expressions that would otherwise be > difficult using the battery of common UNIX tools). > > Anyhow, I hope this offers some perspective on why there's bean-query in > the first place. I'd love to be able to do without, but I don't think it's > possible. > > -- You received this message because you are subscribed to the Google Groups "Beancount" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAK21%2BhNNLaEGnYcUhRXUCj%3DdLi6i-5HpZq232fsDJdLrZ3nC3g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
