Re: Getting started; assigning accounts to bank .csv data

Martin Blais Tue, 02 Feb 2016 20:07:39 -0800

On Tue, Feb 2, 2016 at 10:48 PM, John Hendy <jw.he...@gmail.com> wrote:
>
> On Monday, February 1, 2016 at 10:41:26 PM UTC-6, Martin Blais wrote:
>
>> On Mon, Feb 1, 2016 at 1:13 PM, John Hendy <jw.h...@gmail.com> wrote:
>>
>>> Greetings,
>>>
>>>
>>> It's a fresh year and I've been seeing ledger come up on the Org-mode
>>> mailing list for some time and decided to give it a try. I'm coming
>>> from Moneydance and just wanted to get away from the tedious GUI
>>> method of adding information, as well as have flexibility to generate
>>> my own reports/visualizations with python or R, etc. [1]
>>>
>>> Consider that I'm about a week into reading through docs here and
>>> there during evenings. My first step was going to be importing a
>>> downloaded .csv from my bank to get started. I'm still trying to
>>> verify I get the terminology, so I'll use this from the manual:
>>>
>>> From 5.1 Basic format:
>>> ```
>>> This transaction has a date, a payee or description, a target account
>>> (the first posting), and a source account (the second posting). Each
>>> posting specifies what action is taken related to that account.
>>> ```
>>>
>>> From 7.2.1.2 The convert command:
>>> ```
>>> The fields ledger can recognize contain these case-insensitive strings
>>> date, posted, code, payee or desc or description, amount, cost,total,
>>> and note.
>>> ```
>>>
>>> For my purposes, I import my finances primarily to "categorize" (what
>>> I believe here is called adding an account) and assign a payee so that
>>> I can track my spending against a budget. So, I'm surprised there's no
>>> special column keyword I can add for "account". It appears that all I
>>> can do is pass, say, `--account "assets:checking"` to have ledger know
>>> it's against assets:checking? Is that correct?
>>>
>>> From trying to google "import csv account ledger" or similar
>>> variations, I've been surprised that the only tools to do something
>>> like this appear to be interactive one-trans-at-a-time programs like
>>> icsv2ledger and reckon (granted, they can learn or follow rules). I
>>> could quickly go through my bank's .csv and add exp:food:dining,
>>> exp:auto:fuel to my ~100 transactions a month and have those imported
>>> just like the other column data.
>>>
>>
>>
> Thanks for the awesome reply!
>
>
>> Keep in mind that part of the process of importing (they like call it
>> "reconciling") involves
>> - Manually reviewing the transactions for correctness or fraud
>>
>
> I'll get there. For better or worse, I take the downloaded bank .csv as
> "truth" and am mostly interested in getting a better handle on what my
> money is used on, budgeting, planning, etc.
>
>
>> - Merging new transactions with previous transactions imported from the
>> other side (e.g. a payment from a bank account to pay off on'es credit card
>> will typically be imported from both the bank AND credit card accounts; you
>> must merge the corresponding transactions together)
>>
>
> Definitely. Moneydance allowed me to input an account, which would "link"
> the transaction. Then I'd have to delete or merge the other account's
> record of the same transaction.
>


BTW, there are some ideas around about automatically merging two incomplete
transactions. This problem is the dual of solving the issue of settlement
dates, i.e., the problem being that the dates of each of the two sides may
settlement on different days.
See http://furius.ca/beancount/doc/proposal-settlement for some ruminations
and scour the mailing-list, there is more discussion about this.



- Assigning the right category (you can automate this with a script I
>> suppose; frankly it's not much work, I do all of mine manually with the
>> help of auto-completion from Emacs, which is the most important feature IMO)
>>
>
> Huh. Yes, I'll definitely have to look into the emacs mode. I assumed once
> it was in ledger format it would be *a lot* harder to navigate around vs.
> just doing it while it's already in a spreadsheet format.
>

Definitely not, text is there for your pleasure. You typically organize
your Ledger input file in the order that makes the most sense for you
(minus some constraints: Ledger will report the transactions in the order
they appear in the file and the balance assertions are computed as such.
Beancount sorts everything by date so order doesn't matter).



- Moving the resulting transactions to the right place in your file.
>>
>
> I'll have to look into this more. I get that this is the ledger list...
> but is beancount different in this respect? From reading your docs, it
> sounded like beancount didn't care about order. Or are there other reasons
> (besides date) that one would have to move transactions around?
>

In Ledger, the reporting is done in file order. Balance assertions as well.
In Beancount, order is by date, so you don't have to care about how you
organize them.
I think - but I'm not 100% sure - that most Ledger users must store their
input file by section, and in each section in date order, to minimize the
number of out-of-order transactions if they print out a register.
I use org-mode to create sections and each section is stored in date order
for some subset of accounts.



> - Verifying balances visually, or inserting a balance directive which
>> asserts what the final account balance should be (for correctness) after
>> the new transactions.
>>
>> If you do it often enough and you have editing chops, you get used to the
>> dance and it's a breeze.
>> I think the fourth step can be hypothetically solved using heuristics.
>>
>>
>>
>> I feel like I must be missing something with respect to getting the
>>> from/to accounts added to the bank data.
>>>
>>> Perhaps to take a step back...
>>> - are the majority of folks writing their transactions by hand in ledger
>>> format?
>>>
>>
>> Can't say about others, but for me I want to say that about half the
>> importing is semi-automatic.
>> - Credit cards and banks import from downloads but I need to categorize
>> manually (as described above), fairly good quality downloads.
>> - Investment accounts fully automated buys but I need to manually edit
>> sales in some accounts. Great quality of downloads.
>> - Payroll stubs and vesting and a few other things are provided only as
>> PDFs and I don't bother trying to extract (though I've made some headway
>> towards this, it's incomplete; it turns out fully automating table
>> extraction from PDF isn't trivial. The best OSS solution is TabulaPDF by
>> far but you still need to manually identify where the table is).
>> - Cash transactions: I have to enter those by hand. I only book non-food
>> expenses as individual transactions directly, and for food maybe once every
>> six months I'll count my wallet balance and insert one transaction per
>> month to debit away the cash account toward food. If you do this, you end
>> up with surprisingly little transactions to book manually, maybe a
>> few/week. I suppose it could depend on lifestyle choices.
>>
>> It takes me less than 1 hour/week to run through the active accounts,
>> usually first thing Saturday morning when I get up. Most of the pain is
>> logging with user/passwords into the various institutions and clicking the
>> right buttons to generate the downloaded files. Extraction and filing is
>> automated using importers I wrote against LedgerHub. Less active accounts
>> are updated every quarter or when I feel like it.
>>
>>
> This is a helpful time estimate/reference. My main account (checking) has
> ~100 transactions per month. I don't mind categorizing them myself, but I
> hoped for a quick-ish way to do that. Typing "expenses:blah:blah" is pretty
> fast in a spreadsheet. While I *use* emacs, I'm no navigation whiz, and
> going to the right place in a block of text to type the same thing seems
> super tedious vs. a spreadsheet. Hence I was puzzled that I couldn't use
> ledger's convert command to just bring in accounts from the .csv along with
> the rest. After all, all the dates and amounts are there, one can add
> payees... why not accounts?
>

You can probably script that away with a few rules.
I admit that 100 txns/month is more than I have, and I might look into
auto-categorizing most of it myself if I were in that situation.
Problem is, everyone's little scripts appear to have little in common.



> - is there some better way to import bulk data (e.g. via ledger's
>>> convert function) and post-edit once it's in ledger format? It seemed
>>> a .csv in LO calc was pretty convenient vs. scrolling through a long
>>> text file
>>
>> - any other pointers along the above lines would be most welcome.
>>>
>>
>> Check out LedgerHub for ideas.
>>
>> Original design doc:
>> http://furius.ca/ledgerhub/doc/design
>>
>> Post-mortem:
>> http://furius.ca/ledgerhub/doc/postmortem
>>
>> The project is being killed right now, rewritten much better and simpler
>> and migrated into the Beancount project; if you do end up looking at the
>> code make sure you're checking out the "stable" branch, it's a bit of a
>> riot on the default branch right now, it will be broken.
>>
>> Essentially, I'm defining a config (in Python) as a list of "importer"
>> objects and boil the process down to three steps:
>> 1. Identify: Given a messy list of downloaded files (e.g. in
>> ~/Downloads), automatically identify which importer is supposed to handle
>> them
>> 2. Extract: Extracting transactions and statement date from each file, if
>> possble
>> 3. File: Filing away the downloads to a directory hierarchy which mirrors
>> the chart of accounts, for preservation, e.g. in a personal git repo.
>>
>> You could think of adding
>> 0. Fetch: Automatically download the files
>> but that's too hard. Personally I just don't have the stamina to
>> implement this for myself. Given the nature of today's websites and the
>> castles of JavaScript used to implement them, this would be a nightmare to
>> implement for too little payoff. I love the idea of full automation, but I
>> just don't have the time. Note that if you don't mind the nature of their
>> business (they sell your data), you could potentially try to use Yodlee to
>> pull much of it from a single place.
>>
>>
> Yeah, not interested in that. It's not a big deal to download the few
> files I need.
>
>
>> In any case, you can't really get away without writing at least some
>> code--it's just not realistic, the inputs from different people vary too
>> much. There's very little shared code out there (just basic codes for CSV
>> files, like the ones you mention) but too few users that share the same
>> accounts to generate the critical mass needed for reuse. A while back I
>> created the LedgerHub project to host shared importer code and provide a
>> framework for doing the above, but never received much contributions and
>> honestly I didn't put the care and quality attention to it I should have.
>> More importantly, regression testing for those importers is most easily
>> carried out using actual downloaded files compared to a corresponding
>> expected output, but these files don't share well (they contain lots of
>> personal data) so one ends up with two repositories anyhow. And besides
>> there are several design decisions in some importers that may not please
>> every user, in particular about how you choose your accounts for
>> investments (there are degrees of freedom), so even sharing is not entirely
>> an obvious win.
>>
>>
> That's okay, and I'm cool with trying some code. I primarily use R for
> data analysis/plotting, but have started getting introduced to python via
> Coursera recently and hope to dig in more.
>

R won't be fun for doing this. R makes it a huge pain to even do the kind
of data cleaning necessary for prepping data for analysis. Definitely use
Python over it, you'll save a lot of time. If you really need some
specialzed R module, you can create numpy arrays in Python and there is a
module that allows you to invoke the R runtime with these. Best of both
worlds, but I doubt you'll need it.


That's another thing that attracts me to beancount :) That said, these are
> more just general questions at this point. I'm amazed at how much
> documentation there is... but for a total noob, I can say it's a bit
> intimidating and kind of hard to know where one should start! Not to
> mention having questions and not being sure you're even searching for the
> right terminology to answer your question.
>
>
>
>> By the way, I've found that regression testing is the _key_ to
>> maintaining your importer code, because those importers are often written
>> against file formats with no official spec and unexpected surprises show up
>> routinely (e.g. I have XML files with some unescaped "&" characters, which
>> require a custom fix "just for that bank", for instance, lots of nasty
>> surprises), so you really need to be able to reproduce your tests. I think
>> I have to make at least _some_ fix to an importer about once/month, and
>> that sinks maybe a half-hour (involves adding the new file which makes it
>> break, fix the importer code, and potentially update the older expected
>> files for changes).
>>
>> I hope this helps give some color to the process,
>>
>>
>>
> Definitely, and sincere thanks for taking the time to give me some
> pointers!
>
>
> John
>
>
>>
>>
>> I tried to search the list for more of this sort of question, so
>>> forgive me if I've missed something. Replying with links pointing me
>>> in the right direction would be plenty sufficient if this has already
>>> been discussed!
>>>
>>>
>>> Thanks!
>>> John
>>>
>>>
>>> [1] http://moneydance.com/
>>>
>>> --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "Ledger" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to ledger-cli+...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "Ledger" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ledger-cli+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Ledger" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ledger-cli+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Getting started; assigning accounts to bank .csv data

Reply via email to