Re: Getting started; assigning accounts to bank .csv data

Martin Blais Tue, 02 Feb 2016 20:10:06 -0800

BTW, here's an auto-generated example file that looks similar to how I
organize mine using org-mode:
https://bitbucket.org/blais/beancount/src/tip/examples/example.beancount



On Tue, Feb 2, 2016 at 11:07 PM, Martin Blais <[email protected]> wrote:

> On Tue, Feb 2, 2016 at 10:48 PM, John Hendy <[email protected]> wrote:
>>
>> On Monday, February 1, 2016 at 10:41:26 PM UTC-6, Martin Blais wrote:
>>
>>> On Mon, Feb 1, 2016 at 1:13 PM, John Hendy <[email protected]> wrote:
>>>
>>>> Greetings,
>>>>
>>>>
>>>> It's a fresh year and I've been seeing ledger come up on the Org-mode
>>>> mailing list for some time and decided to give it a try. I'm coming
>>>> from Moneydance and just wanted to get away from the tedious GUI
>>>> method of adding information, as well as have flexibility to generate
>>>> my own reports/visualizations with python or R, etc. [1]
>>>>
>>>> Consider that I'm about a week into reading through docs here and
>>>> there during evenings. My first step was going to be importing a
>>>> downloaded .csv from my bank to get started. I'm still trying to
>>>> verify I get the terminology, so I'll use this from the manual:
>>>>
>>>> From 5.1 Basic format:
>>>> ```
>>>> This transaction has a date, a payee or description, a target account
>>>> (the first posting), and a source account (the second posting). Each
>>>> posting specifies what action is taken related to that account.
>>>> ```
>>>>
>>>> From 7.2.1.2 The convert command:
>>>> ```
>>>> The fields ledger can recognize contain these case-insensitive strings
>>>> date, posted, code, payee or desc or description, amount, cost,total,
>>>> and note.
>>>> ```
>>>>
>>>> For my purposes, I import my finances primarily to "categorize" (what
>>>> I believe here is called adding an account) and assign a payee so that
>>>> I can track my spending against a budget. So, I'm surprised there's no
>>>> special column keyword I can add for "account". It appears that all I
>>>> can do is pass, say, `--account "assets:checking"` to have ledger know
>>>> it's against assets:checking? Is that correct?
>>>>
>>>> From trying to google "import csv account ledger" or similar
>>>> variations, I've been surprised that the only tools to do something
>>>> like this appear to be interactive one-trans-at-a-time programs like
>>>> icsv2ledger and reckon (granted, they can learn or follow rules). I
>>>> could quickly go through my bank's .csv and add exp:food:dining,
>>>> exp:auto:fuel to my ~100 transactions a month and have those imported
>>>> just like the other column data.
>>>>
>>>
>>>
>> Thanks for the awesome reply!
>>
>>
>>> Keep in mind that part of the process of importing (they like call it
>>> "reconciling") involves
>>> - Manually reviewing the transactions for correctness or fraud
>>>
>>
>> I'll get there. For better or worse, I take the downloaded bank .csv as
>> "truth" and am mostly interested in getting a better handle on what my
>> money is used on, budgeting, planning, etc.
>>
>>
>>> - Merging new transactions with previous transactions imported from the
>>> other side (e.g. a payment from a bank account to pay off on'es credit card
>>> will typically be imported from both the bank AND credit card accounts; you
>>> must merge the corresponding transactions together)
>>>
>>
>> Definitely. Moneydance allowed me to input an account, which would "link"
>> the transaction. Then I'd have to delete or merge the other account's
>> record of the same transaction.
>>
>
> BTW, there are some ideas around about automatically merging two
> incomplete transactions. This problem is the dual of solving the issue of
> settlement dates, i.e., the problem being that the dates of each of the two
> sides may settlement on different days.
> See http://furius.ca/beancount/doc/proposal-settlement for some
> ruminations and scour the mailing-list, there is more discussion about this.
>
>
>
> - Assigning the right category (you can automate this with a script I
>>> suppose; frankly it's not much work, I do all of mine manually with the
>>> help of auto-completion from Emacs, which is the most important feature IMO)
>>>
>>
>> Huh. Yes, I'll definitely have to look into the emacs mode. I assumed
>> once it was in ledger format it would be *a lot* harder to navigate around
>> vs. just doing it while it's already in a spreadsheet format.
>>
>
> Definitely not, text is there for your pleasure. You typically organize
> your Ledger input file in the order that makes the most sense for you
> (minus some constraints: Ledger will report the transactions in the order
> they appear in the file and the balance assertions are computed as such.
> Beancount sorts everything by date so order doesn't matter).
>
>
>
> - Moving the resulting transactions to the right place in your file.
>>>
>>
>> I'll have to look into this more. I get that this is the ledger list...
>> but is beancount different in this respect? From reading your docs, it
>> sounded like beancount didn't care about order. Or are there other reasons
>> (besides date) that one would have to move transactions around?
>>
>
> In Ledger, the reporting is done in file order. Balance assertions as well.
> In Beancount, order is by date, so you don't have to care about how you
> organize them.
> I think - but I'm not 100% sure - that most Ledger users must store their
> input file by section, and in each section in date order, to minimize the
> number of out-of-order transactions if they print out a register.
> I use org-mode to create sections and each section is stored in date order
> for some subset of accounts.
>
>
>
>> - Verifying balances visually, or inserting a balance directive which
>>> asserts what the final account balance should be (for correctness) after
>>> the new transactions.
>>>
>>> If you do it often enough and you have editing chops, you get used to
>>> the dance and it's a breeze.
>>> I think the fourth step can be hypothetically solved using heuristics.
>>>
>>>
>>>
>>> I feel like I must be missing something with respect to getting the
>>>> from/to accounts added to the bank data.
>>>>
>>>> Perhaps to take a step back...
>>>> - are the majority of folks writing their transactions by hand in
>>>> ledger format?
>>>>
>>>
>>> Can't say about others, but for me I want to say that about half the
>>> importing is semi-automatic.
>>> - Credit cards and banks import from downloads but I need to categorize
>>> manually (as described above), fairly good quality downloads.
>>> - Investment accounts fully automated buys but I need to manually edit
>>> sales in some accounts. Great quality of downloads.
>>> - Payroll stubs and vesting and a few other things are provided only as
>>> PDFs and I don't bother trying to extract (though I've made some headway
>>> towards this, it's incomplete; it turns out fully automating table
>>> extraction from PDF isn't trivial. The best OSS solution is TabulaPDF by
>>> far but you still need to manually identify where the table is).
>>> - Cash transactions: I have to enter those by hand. I only book non-food
>>> expenses as individual transactions directly, and for food maybe once every
>>> six months I'll count my wallet balance and insert one transaction per
>>> month to debit away the cash account toward food. If you do this, you end
>>> up with surprisingly little transactions to book manually, maybe a
>>> few/week. I suppose it could depend on lifestyle choices.
>>>
>>> It takes me less than 1 hour/week to run through the active accounts,
>>> usually first thing Saturday morning when I get up. Most of the pain is
>>> logging with user/passwords into the various institutions and clicking the
>>> right buttons to generate the downloaded files. Extraction and filing is
>>> automated using importers I wrote against LedgerHub. Less active accounts
>>> are updated every quarter or when I feel like it.
>>>
>>>
>> This is a helpful time estimate/reference. My main account (checking) has
>> ~100 transactions per month. I don't mind categorizing them myself, but I
>> hoped for a quick-ish way to do that. Typing "expenses:blah:blah" is pretty
>> fast in a spreadsheet. While I *use* emacs, I'm no navigation whiz, and
>> going to the right place in a block of text to type the same thing seems
>> super tedious vs. a spreadsheet. Hence I was puzzled that I couldn't use
>> ledger's convert command to just bring in accounts from the .csv along with
>> the rest. After all, all the dates and amounts are there, one can add
>> payees... why not accounts?
>>
>
> You can probably script that away with a few rules.
> I admit that 100 txns/month is more than I have, and I might look into
> auto-categorizing most of it myself if I were in that situation.
> Problem is, everyone's little scripts appear to have little in common.
>
>
>
>> - is there some better way to import bulk data (e.g. via ledger's
>>>> convert function) and post-edit once it's in ledger format? It seemed
>>>> a .csv in LO calc was pretty convenient vs. scrolling through a long
>>>> text file
>>>
>>> - any other pointers along the above lines would be most welcome.
>>>>
>>>
>>> Check out LedgerHub for ideas.
>>>
>>> Original design doc:
>>> http://furius.ca/ledgerhub/doc/design
>>>
>>> Post-mortem:
>>> http://furius.ca/ledgerhub/doc/postmortem
>>>
>>> The project is being killed right now, rewritten much better and simpler
>>> and migrated into the Beancount project; if you do end up looking at the
>>> code make sure you're checking out the "stable" branch, it's a bit of a
>>> riot on the default branch right now, it will be broken.
>>>
>>> Essentially, I'm defining a config (in Python) as a list of "importer"
>>> objects and boil the process down to three steps:
>>> 1. Identify: Given a messy list of downloaded files (e.g. in
>>> ~/Downloads), automatically identify which importer is supposed to handle
>>> them
>>> 2. Extract: Extracting transactions and statement date from each file,
>>> if possble
>>> 3. File: Filing away the downloads to a directory hierarchy which
>>> mirrors the chart of accounts, for preservation, e.g. in a personal git
>>> repo.
>>>
>>> You could think of adding
>>> 0. Fetch: Automatically download the files
>>> but that's too hard. Personally I just don't have the stamina to
>>> implement this for myself. Given the nature of today's websites and the
>>> castles of JavaScript used to implement them, this would be a nightmare to
>>> implement for too little payoff. I love the idea of full automation, but I
>>> just don't have the time. Note that if you don't mind the nature of their
>>> business (they sell your data), you could potentially try to use Yodlee to
>>> pull much of it from a single place.
>>>
>>>
>> Yeah, not interested in that. It's not a big deal to download the few
>> files I need.
>>
>>
>>> In any case, you can't really get away without writing at least some
>>> code--it's just not realistic, the inputs from different people vary too
>>> much. There's very little shared code out there (just basic codes for CSV
>>> files, like the ones you mention) but too few users that share the same
>>> accounts to generate the critical mass needed for reuse. A while back I
>>> created the LedgerHub project to host shared importer code and provide a
>>> framework for doing the above, but never received much contributions and
>>> honestly I didn't put the care and quality attention to it I should have.
>>> More importantly, regression testing for those importers is most easily
>>> carried out using actual downloaded files compared to a corresponding
>>> expected output, but these files don't share well (they contain lots of
>>> personal data) so one ends up with two repositories anyhow. And besides
>>> there are several design decisions in some importers that may not please
>>> every user, in particular about how you choose your accounts for
>>> investments (there are degrees of freedom), so even sharing is not entirely
>>> an obvious win.
>>>
>>>
>> That's okay, and I'm cool with trying some code. I primarily use R for
>> data analysis/plotting, but have started getting introduced to python via
>> Coursera recently and hope to dig in more.
>>
>
> R won't be fun for doing this. R makes it a huge pain to even do the kind
> of data cleaning necessary for prepping data for analysis. Definitely use
> Python over it, you'll save a lot of time. If you really need some
> specialzed R module, you can create numpy arrays in Python and there is a
> module that allows you to invoke the R runtime with these. Best of both
> worlds, but I doubt you'll need it.
>
>
> That's another thing that attracts me to beancount :) That said, these are
>> more just general questions at this point. I'm amazed at how much
>> documentation there is... but for a total noob, I can say it's a bit
>> intimidating and kind of hard to know where one should start! Not to
>> mention having questions and not being sure you're even searching for the
>> right terminology to answer your question.
>>
>>
>>
>>> By the way, I've found that regression testing is the _key_ to
>>> maintaining your importer code, because those importers are often written
>>> against file formats with no official spec and unexpected surprises show up
>>> routinely (e.g. I have XML files with some unescaped "&" characters, which
>>> require a custom fix "just for that bank", for instance, lots of nasty
>>> surprises), so you really need to be able to reproduce your tests. I think
>>> I have to make at least _some_ fix to an importer about once/month, and
>>> that sinks maybe a half-hour (involves adding the new file which makes it
>>> break, fix the importer code, and potentially update the older expected
>>> files for changes).
>>>
>>> I hope this helps give some color to the process,
>>>
>>>
>>>
>> Definitely, and sincere thanks for taking the time to give me some
>> pointers!
>>
>>
>> John
>>
>>
>>>
>>>
>>> I tried to search the list for more of this sort of question, so
>>>> forgive me if I've missed something. Replying with links pointing me
>>>> in the right direction would be plenty sufficient if this has already
>>>> been discussed!
>>>>
>>>>
>>>> Thanks!
>>>> John
>>>>
>>>>
>>>> [1] http://moneydance.com/
>>>>
>>>> --
>>>>
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Ledger" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Ledger" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Ledger" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Getting started; assigning accounts to bank .csv data

Reply via email to