Re: Import from email notifications

'Aaron Axvig' via Beancount Sat, 07 Sep 2024 20:20:47 -0700

I did get this working but I don't think I will continue to use it. Idon't look at my finances often enough. In fact I usually pick away atupdating all my accounts every 6 months (for an hour or two on about 10evenings).

The only real downside was restaurant transactions that included tips. I would only get an email for the initial charge so my imported valuewould not have the final total. Then I would have to comb througheverything to find the issues and get my balance checks to pass.

I am going to pursue CSV import like a normal person, but I am glad tohave tried. I'll dump some code below in case it inspires and/orhorrifies anyone someday.


Enjoy!

Importing the emails was quite fiddly...bizarre MIME quoted-printablestuff. I had planned to use the Unread state to keep track of what hadbeen processed. Maybe that explains the ~5% mysteriously missingtransaction rate.


fromdatetimeimportdatetime
importquopri
importemail
importimaplib
importbase64
importos
importre
fromemailimportpolicy
fromemail.messageimportEmailMessage
fromemail.parserimportParser
frompprintimportpprint
mail=imaplib.IMAP4_SSL(email_server, 993)
mail.login(email_user, email_pass)
test2=mail.select("INBOX.CC-notifications")
test=mail.list()
type, data=mail.search(None, 'UNSEEN')
fornumindata[0].split():
typ, data=mail.fetch(num, '(RFC822)')
forresponse_partindata:
ifisinstance(response_part, tuple):
raw_email=data[0][1]# converts byte literal to string removing b''
raw_email_string=raw_email.decode('utf-8')
msg=email.message_from_string(raw_email_string)
ifmsg['Message-Id']:
subject=msg['Subject']
# Like: 'Thu, 11 Jan 2024 22:39:49 -0500 (EST)'
date_with_parens=msg['Date']
date_string=re.sub(r' \([^)]*\)', '', date_with_parens)
date=datetime.strptime(date_string, '%a, %d%b%Y %H:%M:%S %z')
filename='downloads/'+date.strftime('%Y-%m-%d_%H-%M-%S') +'.eml'
withopen(filename, 'w') asfile:
file.write(raw_email_string)
else:
print('No Message-Id for %s')

Nothing too crazy with parsing the data out of the email body. It issusceptible to changing formatting


importdecimal
importos
importre
frombeancount.coreimportamount
frombeancount.coreimportdata
frombeancount.core.amountimportAmount
frombeancount.core.numberimportD
frombeancount.ingestimportimporter
frombeancount.ingestimportregression
fromchardetimportdetect
fromdatetimeimportdatetime
fromemailimportpolicy
fromemail.parserimportParser
frompprintimportpprint
classImporter(importer.ImporterProtocol):
"""An importer for <credit card> email notifications"""
def__init__(self, lastFour, account, accounts, currency):
self.lastFour=lastFour
self.account=account
self.accounts=accounts
self.currency=currency
defidentify(self, file):
#print(self.extract_amount(self.extract_body(file)))
return(re.search(self.lastFour, self.extract_body(file)) isnotNone)
deffile_name(self, file):
returnos.path.basename(file.name).split('_')[1]
deffile_account(self, _):
returnself.account
deffile_date(self, file):
returnself.extract_date(self.extract_body(file))
defextract(self, file):
entries=[]
body=self.extract_body(file)
meta=data.new_metadata(file.name, lineno=0)
date=self.extract_date(body)
flag="txn"
payee=self.extract_payee(body)
narration=""
tags=data.EMPTY_SET
links=data.EMPTY_SET

txn=data.Transaction(meta, date, flag, payee, narration, tags, links,postings=[])

account=self.file_account(file)
liabilities_units=Amount(-self.extract_amount(body), self.currency)
expenses_units=Amount(self.extract_amount(body), self.currency)

txn.postings.append(data.Posting(account, liabilities_units, None, None,None, None))

ifpayeeinself.accounts:

txn.postings.append(data.Posting(self.accounts.get(payee),expenses_units, None, None, None, None))

else:

txn.postings.append(data.Posting("Expenses:Uncategorized",expenses_units, None, None, None, None))

entries.append(txn)
returnentries
defextract_body(self, file):
withopen(file.name, 'r', encoding='utf-8', errors='ignore') asin_file:
msg=Parser(policy=policy.default).parse(in_file)
returnmsg.get_body().get_content()
defextract_date(self, body):
# Email contains "Jan 1, 2024 at 12:00 PM ET" as an example.

date_text=re.search("(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[0-9]{1,2}, [0-9]{4} at ", body).group()

returndatetime.strptime(date_text, "%b%d, %Y at ").date()
defextract_payee(self, body):
# Email contains this as an example:

# <td align="left" style="vertical-align:top; padding:0px 0px 0px 0px;font-family:Arial, Helvetica, sans-serif; font-size:16px;color:#414042;" class="font14">Merchant</td># <td align="right" style="vertical-align:top; padding:0px 0px 0px 5px;font-family:Arial, Helvetica, sans-serif; font-size:16px;font-weight:bold; color:#414042;" class="font14">GOOGLE *SyncTech</td>

returnre.search("Merchant<\\/td>\\n.*font14\">(.*?)<\\/td>", body).group(1)
defextract_amount(self, body):
# Email contains this as an example:

# <td align="left" style="vertical-align:top; padding:0px 0px 0px 0px;font-family:Arial, Helvetica, sans-serif; font-size:16px;color:#414042;" class="font14">Amount</td># <td align="right" style="vertical-align:top; padding:0px 0px 0px 5px;font-family:Arial, Helvetica, sans-serif; font-size:16px;font-weight:bold; color:#414042;" class="font14">$5.36</td>returnD(re.search("Amount<\\/td>\\n.*font14\">\\$(.*?)<\\/td>",body).group(1))

The accountsvariable passed in to __init__ is just an array like this,for categorization of expenses:


expense_accounts={ }

expense_accounts["ADAFRUIT INDUSTRIES"]="Expenses:Shopping:Electronics-Software"

expense_accounts["ARCO"] ="Expenses:Auto-Transport:Gas-Fuel"
expense_accounts["Barnes and Noble"] ="Expenses:Shopping:Books"

On 1/7/24 00:06, Timothy Jesionowski wrote:

So the thing about beancount is that it's just python. It sounds likeyou're using a library to parse QIF files, but anything you can parseinto python can be fed into an importer. So just...figure out how toextract the relevant information from those emails and use whateverimporter you think is the most similar as a starting point.
For example, when I was writing an importer for my credit cardstatements this code<https://github.com/beancount/beancount/blob/v2/examples/ingest/office/importers/utrade/utrade_csv.py>as a starting point. And the actual parsing is just python's standardcsv library.
As for your specific case with the emails, that looks like somebespoke string parsing so I'd just throw regexes at the emails untilyou have the substrings you want.
That's what I did in my case, and it's what I'd probably do in yourcase. But if you want very fast ledger updates<https://reds-rants.netlify.app/personal-finance/the-five-minute-ledger-update/>,then you should look into using ofx<https://reds-rants.netlify.app/personal-finance/a-word-about-input-formats-use-ofx-when-you-can/>for your imports. It's on my wishlist for my own workflow, I justhaven't gotten around to it yet.
Sincerely,
Timothy Jesionowski
On Sat, Jan 6, 2024 at 2:02 PM 'Patrick Ruckstuhl' via Beancount<[email protected]> wrote:
    Hi,

    not exactly the same case, but related.


    I created an importer which wrapps around other importers for
    attachments,
    https://tariochbctools.readthedocs.io/en/latest/importers.html#mail-adapter

    Maybe you can use some of it as a base for your importer.


    Regards,

    Patrick


    On 06.01.2024 19:57, 'Aaron Axvig' via Beancount wrote:
    > Hello, new Beancount user of a few weeks here.  I'm cutting over
    from
    > Quicken as of the first of the year.  The import was a lot of
    work but
    > I have fiddled my way to a beautiful 61,000 line file of converted
    > data.  Some fixing of QIF parsers and converters, and lots of
    fixing
    > my old data.  Quicken allows some real garbage!
    >
    > Anyways, as I get ready to do my first import of transactions
    directly
    > into Beancount, it occurs to me that I get an email of every
    > transaction that occurs on my credit cards.  With some email
    rules and
    > cron jobs this could potentially keep my credit card accounts
    > perpetually up to date on at least a daily basis if not better. 
    And
    > it would be verified on a monthly basis with ending balances, etc.
    >
    > Has anyone put some thought into this technique?  Any
    roadblocks? Are
    > any parts of such a processing flow already built?
    >
    > Some sample data from emails is below.  I think account, amount,
    > merchant, and date is all one needs?
    >
    > Example 1:
    >
    > You made a $5.36 transaction
    > Account     Visa (...1234)
    > Date     Jan 1, 2024 at 12:00 PM ET
    > Merchant     GOOGLE *SomeApp
    > Amount     $5.36
    >
    > Example 2:
    >
    > Amount: $16.07
    > Card Ending In 4321
    > Merchant ETSY, INC. 718-8557955 US
    > Date 01/02/2024
    > Time 10:19 AM E
    >
--You received this message because you are subscribed to the Google
    Groups "Beancount" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected]
    <mailto:beancount%[email protected]>.
    To view this discussion on the web visit
    
https://groups.google.com/d/msgid/beancount/f56658f7-92e0-4d2f-8646-a28e6117424a%40tario.org.

--
You received this message because you are subscribed to the GoogleGroups "Beancount" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected].To view this discussion on the web visithttps://groups.google.com/d/msgid/beancount/CAOVsoWTd1Y0M-%3D-7R7OPJN%3DRVtCRu5uYRtHzcAUX0Jtwkcc6-g%40mail.gmail.com<https://groups.google.com/d/msgid/beancount/CAOVsoWTd1Y0M-%3D-7R7OPJN%3DRVtCRu5uYRtHzcAUX0Jtwkcc6-g%40mail.gmail.com?utm_medium=email&utm_source=footer>.


--
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/8efc1197-867a-4e02-827a-327e8911b1b4%40axvig.com.

Re: Import from email notifications

Reply via email to