Re: Newbie Setting Up CSV Import / Ingest

doshying Sat, 15 Sep 2018 21:15:44 -0700

Hey, 

I'm in a very similar boat, were you able to post your importer files 
publicly? I think seeing the conversation of you working through this, 
along with your finished files would make your files a lot more easier to 
understand than the current examples I've seen.


Cheers,


On Friday, 20 July 2018 02:22:48 UTC+10, [email protected] wrote:
>
> I figured it out. The dumb_categorizer does .lower(): and I was passing 
> it a search term with a capital letter in it. Now I'm off to the races.. :)
>
> I think maybe I might publish my working setup once I get it all cleaned 
> up, as yet another example for others to follow.
>
> TRS-80
>
> -- 
> Securely sent with Tutanota. Claim your encrypted mailbox today! 
> https://tutanota.com
>
> 19. Jul 2018 10:44 by [email protected] <javascript:>:
>
> OK, I am successfully calling dumb_categorizer from CSV Importer by 
> defining it at beginning of .config file, and then passing categorizer = 
> dumb_categorizer to CSV Importer. I know this because I replaced it with a 
> simple print("something") and I got a bunch of "something" on stdout. So 
> the categorizer is getting called, it's just either not matching or not 
> attaching the other leg... ?
>
> Any help would be greatly appreciated.
>
> TRS-80
> -- 
> Securely sent with Tutanota. Claim your encrypted mailbox today! 
> https://tutanota.com
>
> 19. Jul 2018 08:52 by [email protected] <javascript:>:
>
> I suppose I should have included a link to the CSV importer source: 
> https://bitbucket.org/blais/beancount/src/80d30d6896cf5fdcff8c1156cab77107ee8e0f96/beancount/ingest/importers/csv.py?at=default&fileviewer=file-view-default
>
> Down toward the bottom (line 283) is where the categorizer gets called.
>
> Last night at my local LUG, I volunteered to do a talk next month on plain 
> text accounting, and got the green light. So it would be nice to get this 
> working by then. :)
>
> TRS-80
> -- 
> Securely sent with Tutanota. Claim your encrypted mailbox today! 
> https://tutanota.com
>
> 19. Jul 2018 08:32 by [email protected] <javascript:>:
>
> It is still unclear to me where to put this categorizer code? I have tried 
> putting it here, there, and everywhere. I am using the provided generic CSV 
> importer, which calls it, but I cannot figure out where to put it or how to 
> instantiate it or whatever it is you need to do in Python.
>
> Since I don't really know Python, I am happy to pay someone few bucks to 
> help me get this working.
>
> (from 
> https://bitbucket.org/blais/beancount/pull-requests/24/improve-ingestimporterscsv/diff
> ):
>
> def dumb_categorizer(txn):
>     # At this time the txn has only one posting
>     try:
>         posting1 = txn.postings[0]
>     except IndexError:
>         return txn
>
>     # Guess the account(s) of the other posting(s)
>     if 'nutella' in txn.narration.lower():
>         account = 'Expenses:Food'
>     else:
>         return txn
>
>     # Make the other posting(s)
>     posting2 = posting1._replace(
>         account=account,
>         units=-posting1.units
>     )
>
>     # Insert / Append the posting into the transaction
>     if posting1.units < posting2.units:
>         txn.postings.append(posting2)
>     else:
>         txn.postings.insert(0, posting2)
>
>     return txn
>
>
>
> -- 
> Securely sent with Tutanota. Claim your encrypted mailbox today! 
> https://tutanota.com
>
> 25. Jun 2018 16:33 by [email protected] <javascript:>:
>
> OK, stayed up late last night and actually got all my character stripping 
> accomplished in Python within the provided tools. Yay me (first Python code 
> I ever wrote)! :)
>
> OK so basic CSV importers are working, now trying to figure out where to 
> stick the categorizer code I found here: 
> https://bitbucket.org/blais/beancount/pull-requests/24/improve-ingestimporterscsv/diff
>
> I been trying here and there without success as of yet. Any hints/pointers 
> would be greatly appreciated!
>
> TRS-80
> -- 
> Securely sent with Tutanota. Claim your encrypted mailbox today! 
> https://tutanota.com
>
> 24. Jun 2018 15:21 by [email protected] <javascript:>:
>
> On Sun, Jun 24, 2018 at 11:58 AM <[email protected] <javascript:>> 
> wrote:
>
>> [...]But by all means, please correct me if I am wrong, or have missed 
>> something.
>>
>> So now that I have attained some success, and see the light at the end of 
>> the tunnel, it looks like I will have to do ~ the following:
>> 1.Manually download CSV file from bank.
>>
> Yes
>  
>
>> 2.Do some pre-processing, either manually or with macros in Emacs, or 
>> (more likely) programatically, using scripts and sed, etc. to remove parens 
>> and $s.
>>
> You can write code in your importer to do that.
>  
>
>> 3.Run the actual bean-import.
>>
> You mean bean-extract.
>
> 4.Run some post processing (I would like to change date: metadata name to 
>> transaction_date: because I think it's more descriptive).
>>
> Do that in your importer code as well.
>  
>
> 5.And then finally hand copy these transactions into my main .beancount 
>> file, double checking and tweaking (aka "clearing") them in the process, 
>> categorizing remaining ones into Expense accounts and perhaps updating my 
>> scripts in the process.
>>
> Yes.
>
> I suppose 2, 4, and 5 could be done all in Emacs, but I'll just have to 
>> figure out some workflow now that works for me.
>>
> Yes.
>  
>
>>
>> Also not mentioned is somehow programatically inserting the other leg of 
>> the transaction (which Expense account). I agree with Martin's basic 
>> philosophy on this, and still plan on manually reviewing everything, 
>> however I am already seeing that the bulk of transactions are the same 
>> places in my case and could easily be categorized with some simple matching 
>> (either in a post matching script or within bean-extract using 
>> categorizer). I need to look into this more, and also experiment or read up 
>> on how the de-duplication works, as I think it's probably related.
>>
>
> You can write some function for your importer to do that with your 
> particular rules if it saves you time.
>
>
> Anyway, I will continue to report on what I find as I go along, and even 
>> though I'm not getting any replies 
>>
> Short emails with direct questions -> more replies more quickly
>
>  
>
>> hopefully this will either encourage others to try and set this up or 
>> perhaps help other noobs who come along later looking for more in depth 
>> info (or perhaps stumble across similar error messages searching the 
>> internet) and it eventually helps someone.
>>
>> Helpful tips, encouraging words, or even just letting me know if anyone 
>> is actually reading my idiotic ramblings are always welcomed. :D
>>
>
> Sounds like you're making great progress!
> Unfortunately automating the importing still requires writing Python code 
> and I see no way around that, I wish it was easier.
>
>  
>
>>
>> TRS-80
>> -- 
>> Securely sent with Tutanota. Claim your encrypted mailbox today! 
>> https://tutanota.com
>>
>> 22. Jun 2018 19:21 by [email protected] <javascript:>:
>>
>> Yeah I was completely on the wrong track before (I think). But I am on 
>> the right one now (I think)?
>>
>> So what I have done is just copy the csv.py file and save it as 
>> __init__.py in my importers/suncoast_g directory. Then I put the following 
>> into ledger.config: 
>> https://paste.pound-python.org/show/popHoa0wvVE2OiPCqIAL
>>
>> But now when doing bean-extract I get "ValueError: CSV config without 
>> header has non-index fields: {'[DATE]': 'Posted Date', '[TXN_DATE]': 
>> 'Transaction Date', '[NARRATION1]': 'Description', '[CREDIT]': 'Deposit', 
>> '[DEBIT]': 'Withdrawal', '[BALANCE]': 'Balance'}"
>>
>> Yes my CSV have headers. I been searching the internet for that error, 
>> but still scratching my head. Also tried to change '[DATE]' to 'DATE' etc. 
>> but that didn't seem to make a difference either.
>>
>> Of course, I could be completely off track (this is my fourth different 
>> approach). I been flailing around at this all day and a good part of 
>> yesterday too. Early in the morning until late at night. At this point I 
>> would be willing to send someone a few dollars to help me get this set up. 
>> I am sure I could get other accounts working and maintain it once I can 
>> just get the first one working.
>>
>> When I first saw my credit union's CSV file I thought "this should be 
>> easy" because it's very straightforward. I don't need all this complicated 
>> parsing like I have seen in some of the other Importers I have been 
>> studying. Just a straight CSV import. Or so I thought... :/
>>
>> Anyway, any help at all would be greatly appreciated at this point. Any 
>> clue might help!
>>
>> TRS-80
>> -- 
>> Securely sent with Tutanota. Claim your encrypted mailbox today! 
>> https://tutanota.com
>>
>> 22. Jun 2018 14:19 by [email protected] <javascript:>:
>>
>> OK I sought and received some help in @python. I think I am on a much 
>> better track now. I don't know where I got my original __init__.py from, 
>> some similar thread here I think.
>>
>> But now I have downloaded from source the utrade one from: 
>> https://bitbucket.org/blais/beancount/src/65212d1176bb427a7883d2593edbd0e0545a145a/examples/ingest/office/importers/utrade/__init__.py?at=default&fileviewer=file-view-default
>>  
>> and am modifying that to my needs. I now see that I missed a whole bunch of 
>> the methods listed in "Writing an Importer" section of "Importing External 
>> Data" Docs. It will take me a while to work through it but I will post 
>> something back later, including results. I just didn't want anyone to spend 
>> time posting a long reply in the meantime.
>>
>> Fun fun! :)
>>
>> TRS-80
>>
>> -- 
>> Securely sent with Tutanota. Claim your encrypted mailbox today! 
>> https://tutanota.com
>>
>> 22. Jun 2018 12:08 by [email protected] <javascript:>:
>>
>> OK, so this is quite challenging for someone who doesn't really know 
>> Python. However I think it's a good exercise not only for myself but also 
>> to help other newbies who would like to try and get this awesome feature 
>> working.
>>
>> I have read everything I can in source and mailing list about CSV Import 
>> / Ingest and I've made some progress, but now I'm stuck. 
>>
>> Apologies in advance for ugly formatting, Google Groups apparently do not 
>> support inline text formatting, and I am communicating with the group via 
>> email.
>>
>> I've tried to (mostly) follow the naming conventions in the examples but 
>> it seems they have changed over time. Anyway, file structure looks like so:
>> ~/fin
>>     |---documents
>>     |---Downloads
>>     |---importers
>>     |    |---suncoast_g
>>     |         |---__init__.py   (this file shared below)
>>     |    |---__init__.py        (this file is empty)
>>     |---ledger.beancount
>>     |---ledger.config         (I have seen this also referenced as 
>> .import in docs)
>>
>> Here is my ledger.config file:
>> --------------------(begin ledger.config file)--------------------
>> #!/usr/bin/env python3
>> """Example import configuration."""
>>
>> # Insert our custom importers path here.
>> # (In practice you might just change your PYTHONPATH environment.)
>> import sys
>> from os import path
>> sys.path.insert(0, path.join(path.dirname(__file__)))
>>
>> from importers import suncoast_g
>> #from importers import acme_pdf
>>
>> from beancount.ingest import extract
>> #from beancount.ingest.importers import ofx
>>
>>
>> # Setting this variable provides a list of importer instances.
>> #
>> # Removed the following from below to replace with my own, saved for 
>> reference
>> #
>> #    utrade.Importer("USD",
>> #                    "Assets:US:UTrade",
>> #                    "Assets:US:UTrade:Cash",
>> #                    "Income:US:UTrade:{}:Dividend",
>> #                    "Income:US:UTrade:{}:Gains",
>> #                    "Expenses:Financial:Fees",
>> #                    "Assets:US:BofA:Checking"),
>> #
>> #    ofx.Importer("379700001111222",
>> #                 "Liabilities:US:CreditCard",
>> #                 "bofa"),
>> #
>> #    acme_pdf.Importer("Assets:US:AcmeBank"),
>> #
>> CONFIG = [
>>     suncoast_g.Importer("Assets:Suncoast:Checking-G"),
>> ]
>>
>>
>> # Override the header on extracted text (if desired).
>> extract.HEADER = ';; -*- mode: org; mode: beancount; coding: utf-8; -*-\n'
>> --------------------(end ledger.config file)--------------------
>>
>> OK now the __init__.py that is in suncoast_g contains following:
>> --------------------(begin __init__.py file)--------------------
>> #!/usr/bin/env python3
>>
>> #
>> # Configuration file for extracting Suncoast-G data
>> #
>>
>> from beancount.ingest import regression
>> from beancount.ingest.importers import csv
>>
>> from beancount.plugins import auto_accounts
>>
>>
>> class Importer(csv.Importer):
>>
>>     config = {csv.Col.DATE: 'Posted Date',
>>               csv.Col.TXN_DATE: 'Transaction Date',
>>               csv.Col.NARRATION: 'Description',
>>               csv.Col.AMOUNT_CREDIT: 'Deposit',
>>               csv.Col.AMOUNT_DEBIT: 'Withdrawal',
>>               csv.Col.BALANCE: 'Balance'}
>>
>>     def __init__(self, account):
>>         csv.Importer.__init__(
>>             self, self.config,
>>             account, 'Currency',
>>             ('Posted Date,Transaction Date,Description,'
>>              'Deposit,Withdrawal,Balance'),
>>             1)
>>
>>     def get_description(self, row):
>>         payee, narration = super().get_description()
>>         narration = '{} ({})'.format(narration, row.category)
>>         return payee, narration
>> --------------------(end __init__.py file)--------------------
>>
>> I have just copied this stuff and tried to figure it out. I'm sure I've 
>> got something wrong in here but I don't really know what I'm doing. FYI 
>> here is what the data looks like which is in G.csv in Downloads:
>>
>> Posted Date,Transaction Date,Description,Deposit,Withdrawal,Balance
>> 6/4/2018,6/4/2018,Withdrawal Debit Card SOME BAR & GRILL CITY ST Card 
>> XXXX,,($59.83),$229.15
>>
>> OK I think that's all the relevant info. So now when I do:
>>
>> ~/fin$ bean-identify ledger.config Downloads
>>
>> I get:
>>
>> **** /home/myname/fin/Downloads/A Sunnet History 6186156 
>> 23032018_21062018.csv
>> **** /home/myname/fin/Downloads/G.csv
>>
>> Which I think means it is identifying those 2 files (the only ones in 
>> there) as CSV, correct? I will point out that G.csv is an Asset account and 
>> is my first target here. The other one is a Liability account (credit card) 
>> and therefore has different fields (only one amount, and no balance). But I 
>> figure once I get this one working, that other one (and subsequent others) 
>> should be pretty easy.
>>
>> OK so now when I do:
>>
>> ~/fin$ bean-extract ledger.config Downloads
>>
>> I get:
>>
>> **** /home/myname/fin/Downloads/A Sunnet History 6186156 
>> 23032018_21062018.csv                                                        
>>    
>>
>> **** 
>> /home/myname/fin/Downloads/G.csv                                             
>>                                                        
>>
>> ERROR:root:Importer importers.suncoast_g.Importer: 
>> "Assets:Suncoast:Checking-G".extract() raised an unexpected error: CSV 
>> config without header has non-index fields: {<Col.DATE: '[DATE]'>: 'Posted 
>> Date', <Col.TXN_DATE: '[TXN_DATE]'>: 'Transaction Date', <Col.NARRATION: 
>> '[NARRATION1]'>: 'Description', <Col.AMOUNT_CREDIT: '[CREDIT]'>: 'Deposit', 
>> <Col.AMOUNT_DEBIT: '[DEBIT]'>: 'Withdrawal', <Col.BALANCE: '[BALANCE]'>: 
>> 'Balance'}                                                                   
>>                                                             
>>
>> ERROR:root:Traceback: Traceback (most recent call 
>> last):                                                                       
>>             
>>
>>   File 
>> "/usr/local/lib/python3.6/dist-packages/beancount/ingest/extract.py", line 
>> 187, in extract                                          
>>     
>> allow_none_for_tags_and_links=allow_none_for_tags_and_links)                 
>>                                                           
>>
>>   File 
>> "/usr/local/lib/python3.6/dist-packages/beancount/ingest/extract.py", line 
>> 69, in extract_from_file                                 
>>     new_entries = importer.extract(file, **kwargs)
>>   File 
>> "/usr/local/lib/python3.6/dist-packages/beancount/ingest/importers/csv.py", 
>> line 189, in extract
>>     iconfig, has_header = normalize_config(self.config, file.head())
>>   File 
>> "/usr/local/lib/python3.6/dist-packages/beancount/ingest/importers/csv.py", 
>> line 340, in normalize_config
>>     "{}".format(config))
>> ValueError: CSV config without header has non-index fields: {<Col.DATE: 
>> '[DATE]'>: 'Posted Date', <Col.TXN_DATE: '[TXN_DATE]'>: 'Transaction Date', 
>> <Col.NARRATION: '[NARRATION1]'>: 'Description', <Col.AMOUNT_CREDIT: 
>> '[CREDIT]'>: 'Deposit', <Col.AMOUNT_DEBIT: '[DEBIT]'>: 'Withdrawal', 
>> <Col.BALANCE: '[BALANCE]'>: 'Balance'}
>>
>> ;; -*- mode: org; mode: beancount; coding: utf-8; -*-
>>
>> And this is where I'm currently stuck. I feel like it's something dumb, 
>> something not pointing at something else correctly but I don't know enough 
>> Python (yet) to figure it out myself. Any halp would be greatly 
>> appreciated. :)
>>
>> TRS-80
>> -- 
>> Securely sent with Tutanota. Claim your encrypted mailbox today! 
>> https://tutanota.com 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Beancount" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/beancount/LFcF9ZJ--3-0%40tutanota.com 
>> <https://groups.google.com/d/msgid/beancount/LFcF9ZJ--3-0%40tutanota.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Beancount" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/beancount/LFciKzu--3-0%40tutanota.com 
>> <https://groups.google.com/d/msgid/beancount/LFciKzu--3-0%40tutanota.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Beancount" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/beancount/LFdnLh3--3-0%40tutanota.com 
>> <https://groups.google.com/d/msgid/beancount/LFdnLh3--3-0%40tutanota.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Beancount" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/beancount/LFmJI7Y--B-0%40tutanota.com 
>> <https://groups.google.com/d/msgid/beancount/LFmJI7Y--B-0%40tutanota.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] 
> <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/beancount/CAK21%2BhNT9Wvhd9EtFvp_F6sNKBV4NAFBmw_yJyu_umkHPwY%2Bsw%40mail.gmail.com
>  
> <https://groups.google.com/d/msgid/beancount/CAK21%2BhNT9Wvhd9EtFvp_F6sNKBV4NAFBmw_yJyu_umkHPwY%2Bsw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] 
> <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/beancount/LFsdlPg--3-0%40tutanota.com 
> <https://groups.google.com/d/msgid/beancount/LFsdlPg--3-0%40tutanota.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] 
> <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/beancount/LHmWkuU--3-0%40tutanota.com 
> <https://groups.google.com/d/msgid/beancount/LHmWkuU--3-0%40tutanota.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] 
> <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/beancount/LHmaD4f--F-0%40tutanota.com 
> <https://groups.google.com/d/msgid/beancount/LHmaD4f--F-0%40tutanota.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] 
> <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/beancount/LHmzwng--3-0%40tutanota.com 
> <https://groups.google.com/d/msgid/beancount/LHmzwng--3-0%40tutanota.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/660e92ff-2ba4-4c47-9fbd-eb76b8ec6571%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Newbie Setting Up CSV Import / Ingest

Reply via email to