These would be better done in two separate steps IMHO: 1. extract the data from whichever external source format (e.g. OFX) into an internal transaction data structure 2. "complete" incomplete imported transaction objects by adding missing legs using the past Ledger history
About (1): CSV files are pretty rare. The only ones I've come across (in my own little bubble of a world) are PayPal, OANDA, and Ameritrade. Much more common for banks, investment and credit card companies is OFX and Quicken files. I also find it convenient to recognize at least *some* data from PDF files, such as the date of a statement, for automatic classification and filing into a folder (you could apply machine learning to this problem, i.e. give a whole bunch of disorganized words from what is largely imperfect PDF to text conversion, classify which statement it is, but crafting a few regexps by hand has proved to work quite well so far). I'll add anomyfied example input files to Beancount for automated testing at some point, they'll be going here: https://hg.furius.ca/public/beancount/file/tip/src/python/beancount/sources I'm thinking.... maybe it would make sense for importers (mine and/or yours) to spit out some sort of XML/JSON format that could be converted into either Ledger of Beancount syntax or whatever else? This way all those importers could be farmed out to another project and reused by users of various accounting software. Does this make sense? About (2): If Ledger supports input'ing incomplete transactions, you could do this without relying on CSV conversion, that would be much more reusable. In Beancount, my importers are allowed to create invalid transaction objects, and I plan to put in a simple little perceptron function that should do a good enough job of adding missing legs automatically (one might call this "automatic categorization"), independently of input data format. Just some ideas, On Fri, Jan 24, 2014 at 4:55 AM, Edwin van Leeuwen <[email protected]>wrote: > Hi all, > > Reckon needs your help :) > > Reckon automagically converts CSV files for use with the command-line > accounting tool Ledger. It also helps you to select the correct > accounts associated with the CSV data using Bayesian machine learning. > For more information see: > > http://blog.andrewcantino.com/blog/2010/11/06/command-line-accounting-with-ledger-and-reckon/ > > We would like to expand reckon's ability to automagically convert csv > files. It already supports quite a few formats, but we are interested > in taking this further. For that we need more csv examples, so that we > can make sure those are correctly detected and especially make sure no > mistakes are made. You could really help us out by sending us > (anonimized) csv files as produced by your bank. We'd add those > examples to our test suite and make sure it all works well. Ideally, > we'd need a csv file containing a minimum of 5 transactions. > > The formats currently in the test suite are here: > > https://github.com/cantino/reckon/blob/master/spec/reckon/csv_parser_spec.rb#L207 > > Full disclosure: I am not the original author, but have been > contributing code to make it correctly convert my csv files :) > > Cheers, Edwin > > -- > > --- > You received this message because you are subscribed to the Google Groups > "Ledger" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- --- You received this message because you are subscribed to the Google Groups "Ledger" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
