These would be better done in two separate steps IMHO:

1. extract the data from whichever external source format (e.g. OFX) into
an internal transaction data structure
2. "complete" incomplete imported transaction objects by adding missing
legs using the past Ledger history

About (1): CSV files are pretty rare. The only ones I've come across (in my
own little bubble of a world) are PayPal, OANDA, and Ameritrade. Much more
common for banks, investment and credit card companies is OFX and Quicken
files. I also find it convenient to recognize at least *some* data from PDF
files, such as the date of a statement, for automatic classification and
filing into a folder (you could apply machine learning to this problem,
i.e. give a whole bunch of disorganized words from what is largely
imperfect PDF to text conversion, classify which statement it is, but
crafting a few regexps by hand has proved to work quite well so far).  I'll
add anomyfied example input files to Beancount for automated testing at
some point, they'll be going here:
https://hg.furius.ca/public/beancount/file/tip/src/python/beancount/sources

I'm thinking.... maybe it would make sense for importers (mine and/or
yours) to spit out some sort of XML/JSON format that could be converted
into either Ledger of Beancount syntax or whatever else? This way all those
importers could be farmed out to another project and reused by users of
various accounting software. Does this make sense?

About (2): If Ledger supports input'ing incomplete transactions, you could
do this without relying on CSV conversion, that would be much more
reusable. In Beancount, my importers are allowed to create invalid
transaction objects, and I plan to put in a simple little perceptron
function that should do a good enough job of adding missing legs
automatically (one might call this "automatic categorization"),
independently of input data format.

Just some ideas,




On Fri, Jan 24, 2014 at 4:55 AM, Edwin van Leeuwen <[email protected]>wrote:

> Hi all,
>
> Reckon needs your help :)
>
> Reckon automagically converts CSV files for use with the command-line
> accounting tool Ledger. It also helps you to select the correct
> accounts associated with the CSV data using Bayesian machine learning.
> For more information see:
>
> http://blog.andrewcantino.com/blog/2010/11/06/command-line-accounting-with-ledger-and-reckon/
>
> We would like to expand reckon's ability to automagically convert csv
> files. It already supports quite a few formats, but we are interested
> in taking this further. For that we need more csv examples, so that we
> can make sure those are correctly detected and especially make sure no
> mistakes are made. You could really help us out by sending us
> (anonimized) csv files as produced by your bank. We'd add those
> examples to our test suite and make sure it all works well. Ideally,
> we'd need a csv file containing a minimum of 5 transactions.
>
> The formats currently in the test suite are here:
>
> https://github.com/cantino/reckon/blob/master/spec/reckon/csv_parser_spec.rb#L207
>
> Full disclosure: I am not the original author, but have been
> contributing code to make it correctly convert my csv files :)
>
> Cheers, Edwin
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "Ledger" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Ledger" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to