petl <https://petl.readthedocs.io/en/stable/> is an etl library for tables 
that Martin recently shared. It makes working with .csv files really easy. 
Given writing and maintaining importers can be time consuming (not to 
mention not-much-fun), I thought I's share a walkthrough of how easy petl 
makes it, below. I hope this inspires others to write and share importers. 
(My beancount_reds_importers 
<https://github.com/redstreet/beancount_reds_importers> framework makes a 
few other things easier as well).

This csv importer uses petl to read a .csv into a table for manipulation. 
The output of this reader is a list of namedtuples corresponding roughly to 
ofx transactions. The following steps achieve this (based on my comments in 
the code 
<https://github.com/redstreet/beancount_reds_importers/blob/master/beancount_reds_importers/libimport/csvreader.py>
):

   - read csv into petl table:
      - rdr = etl.fromcsv(file.name)
      - skip header and footer rows (configurable)
      - rdr = rdr.skip(self.skip_head_rows)
      - rdr = rdr.head(len(rdr) - self.skip_tail_rows - 1)
      - prepare_raw_columns: an overridable method to help get the raw 
   table in shape. As an example, the schwab importer does the following
      - rdr.cutout('') # remove the last column, which is empty
      - for rows with interest, the date column contains text such as: 
      '11/16/2018 as of 11/15/2018'. We convert these into a regular parseable 
      date: '11/16/2018'
              def cleanup_date(d):
                  """'11/16/2018 as of 11/15/2018' --> '11/16/2018'"""
                  return d.split(' ', 1)[0]
              rdr = rdr.convert('Date', cleanup_date)
      
      - add a 'tradeDate' column, which is a copy of the 'Date' column, to 
      correspond to the importer API
         - rdr = rdr.addfield('tradeDate', lambda x: x['Date'])
         - add a a 'total' column, which is a copy of the 'Amount' column, 
      to correspond to the importer API
   - rename columns: columns headers are renamed to standardize them to the 
   importer API, using a supplied dictionary. For the included schwab 
   importer, that looks like:

            self.header_map = {
            "Action":      'type',
            "Date":        'date',
            "tradeDate":   'tradeDate',
            "Description": 'memo',
            "Symbol":      'security',
            #etc.}
            rdr = rdr.rename(self.header_map)

   - convert_columns: this fixes up the actual data in each column. The 
   base class does the following:
      - map types to standard types. The standard types that the 
      importer-API uses are loosely based on ofx  standards. For example, the 
      schwab importer needs this mapping: 
   
        self.transaction_type_map = {
            'Bank Interest':      'income',
            'Buy':                'buystock',
            'Cash Dividend':      'dividends',
            'MoneyLink Transfer': 'transfer',
            'Reinvest Dividend':  'dividends',
            'Reinvest Shares':    'buystock',
            'Sell':               'sellstock',
            }
        rdr = rdr.convert('type', self.transaction_type_map)

   - numbers are parsed from string and convered into Decimal type. 
      Non-numeric characters like '$' are removed.
      - dates are parsed and converted into datetime type.
   - The table is now ready for use by the importer. petl makes each row 
   available via petl's namedtuples!

-- 
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/78be2431-abe8-4e25-982b-8cd15dcd671bn%40googlegroups.com.

Reply via email to