petl <https://petl.readthedocs.io/en/stable/> is an etl library for tables that Martin recently shared. It makes working with .csv files really easy. Given writing and maintaining importers can be time consuming (not to mention not-much-fun), I thought I's share a walkthrough of how easy petl makes it, below. I hope this inspires others to write and share importers. (My beancount_reds_importers <https://github.com/redstreet/beancount_reds_importers> framework makes a few other things easier as well).
This csv importer uses petl to read a .csv into a table for manipulation. The output of this reader is a list of namedtuples corresponding roughly to ofx transactions. The following steps achieve this (based on my comments in the code <https://github.com/redstreet/beancount_reds_importers/blob/master/beancount_reds_importers/libimport/csvreader.py> ): - read csv into petl table: - rdr = etl.fromcsv(file.name) - skip header and footer rows (configurable) - rdr = rdr.skip(self.skip_head_rows) - rdr = rdr.head(len(rdr) - self.skip_tail_rows - 1) - prepare_raw_columns: an overridable method to help get the raw table in shape. As an example, the schwab importer does the following - rdr.cutout('') # remove the last column, which is empty - for rows with interest, the date column contains text such as: '11/16/2018 as of 11/15/2018'. We convert these into a regular parseable date: '11/16/2018' def cleanup_date(d): """'11/16/2018 as of 11/15/2018' --> '11/16/2018'""" return d.split(' ', 1)[0] rdr = rdr.convert('Date', cleanup_date) - add a 'tradeDate' column, which is a copy of the 'Date' column, to correspond to the importer API - rdr = rdr.addfield('tradeDate', lambda x: x['Date']) - add a a 'total' column, which is a copy of the 'Amount' column, to correspond to the importer API - rename columns: columns headers are renamed to standardize them to the importer API, using a supplied dictionary. For the included schwab importer, that looks like: self.header_map = { "Action": 'type', "Date": 'date', "tradeDate": 'tradeDate', "Description": 'memo', "Symbol": 'security', #etc.} rdr = rdr.rename(self.header_map) - convert_columns: this fixes up the actual data in each column. The base class does the following: - map types to standard types. The standard types that the importer-API uses are loosely based on ofx standards. For example, the schwab importer needs this mapping: self.transaction_type_map = { 'Bank Interest': 'income', 'Buy': 'buystock', 'Cash Dividend': 'dividends', 'MoneyLink Transfer': 'transfer', 'Reinvest Dividend': 'dividends', 'Reinvest Shares': 'buystock', 'Sell': 'sellstock', } rdr = rdr.convert('type', self.transaction_type_map) - numbers are parsed from string and convered into Decimal type. Non-numeric characters like '$' are removed. - dates are parsed and converted into datetime type. - The table is now ready for use by the importer. petl makes each row available via petl's namedtuples! -- You received this message because you are subscribed to the Google Groups "Beancount" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/78be2431-abe8-4e25-982b-8cd15dcd671bn%40googlegroups.com.
