Re: smart importer newbie question

Jonathan Goldman Wed, 19 May 2021 17:02:38 -0700

Hi Patrick,

Thanks for the suggestions. I started doing this. Here is what I'm seeing:


------CHECKPOINT1-------

1353

1133

0

------CHECKPOINT2-------

[]

---__call__----

Assets:US:Banks:Checking:myBank

------CHECKPOINT1-------

1353

1133

0

------CHECKPOINT2-------

[]

---__call__----

Assets:US:Banks:Checking:myBank


Here is the code I added to predictory.py:

#beg


        print('---__call__----')

        print(self.account)

        #print(existing_entries)


#end


        with self.lock:

            self.define_pipeline()

            self.train_pipeline()

            return self.process_entries(imported_entries)


    def load_open_accounts(self, existing_entries):

        """Return map of accounts which have been opened but not closed."""

        account_map = {}

        if not existing_entries:

            return


        for entry in beancount_sorted(existing_entries):

            # pylint: disable=isinstance-second-argument-not-valid-type


            if isinstance(entry, Open):

                account_map[entry.account] = entry

            elif isinstance(entry, Close):

                account_map.pop(entry.account)


        self.open_accounts = account_map


    def load_training_data(self, existing_entries):

        """Load training data, i.e., a list of Beancount entries."""

training_data = existing_entries or []

        self.load_open_accounts(existing_entries)

#beg1


        print('------CHECKPOINT1-------')

        print(len(training_data))

#end1


        training_data = list(filter_txns(training_data))

        print(len(training_data))

length_all = len(training_data)

        training_data = [

            txn for txn in training_data if self.training_data_filter(txn)

]

        print(len(training_data))

#beg2


        print('------CHECKPOINT2-------')

        print(training_data)

#beg2


--------

I'm trying to check now that every account in the config file is present in
my beancount file. I noticed one missing and that changed what was in the
training_data but still getting the warning about training data being
empty. I'll keep digging as best I can but definitely can use any
additional help.

On Wed, May 19, 2021 at 3:16 AM 'Patrick Ruckstuhl' via Beancount <
[email protected]> wrote:

> Hi Jonathan,
>
>
> Let's try to figure this out. In smart importer can you printout the
> following stuff
>
>
> in smart_importer/predictor.py
>
>
> in __call__ around line 64
>
> print(self.account)
>
> print(existing_entries)
>
>
> in load_training_data around line 91
>
> print(training_data)
>
> and around line 95
>
> print(training_data)
>
>
> That should give an idea where the information is "lost". Depending on
> where the information is lost, you can then dig a bit deeper into what is
> happening.
>
>
> Regards,
>
> Patrick
>
>
>
>
>
> On 18.05.2021 13:14, Jonathan Goldman wrote:
>
> Thanks Red.
>
> bean-query works fine on my input file which now has >1000 transactions .
>
> Ready with 1344 directives (2266 postings in 1133 transactions).
> beancount>
>
> I still get the error. I'm not sure what is causing and not sure how to
> debug it. The only other issue I recall seeing was some error with
> fund_info or something in getting prices but I thought it was an unrelated
> issue.
>
> Do you or does anyone have some suggestions on where/how to debug. E.g. I
> should print some variables to STDOUT at such and such point inside
> smart_importer code or inside bean-extract.
>
> thanks,
> Jonathan
>
>
>
> On Mon, May 17, 2021 at 9:34 PM [email protected] <[email protected]>
> wrote:
>
>> A minimum of two transactions should suffice for smart_importer. More
>> will increase prediction quality, but two should suffice. I can't tell
>> what's happening at your end, but you're likely ending up with zero
>> transactions for some reason. Run bean-query on the file you pass to "-f"
>> of bean-extract.
>>
>> beancount-reds-importers supports smart_importer out of the box for
>> banking, that shouldn't be an issue AFAICT.
>>
>>
>>
>> On Wednesday, May 12, 2021 at 10:23:14 PM UTC-7 [email protected] wrote:
>>
>>> Thanks for suggestions @Patrick and Alan. My beancount file has about 64
>>> Asset accounts. It has about 41 expense accounts. I have only 2 months of
>>> labelled banking transactions (about 42 transactions) all associated with
>>> one bank account and various expense accounts.
>>>
>>> I had thought that some transactions were relatively deterministic (same
>>> $ amount and same description like rent/mortgage) and I was under the
>>> impression that only a few months of data are needed to get going.
>>>
>>> Perhaps I'll just go back to manually labelling data for now and trying
>>> again later or after I see more posts/explanation of smart_importer. I'm
>>> not well-versed enough with smart_importer to debug what is happening.
>>>
>>> On Thu, May 13, 2021 at 3:04 AM Alan H <[email protected]> wrote:
>>>
>>>> I get this error when there are insufficient entries in the journal to
>>>> teach the smart_importer how to file new transactions. Specifically there
>>>> are no matches for payees or narrations.
>>>>
>>>> Is that the case? Try adding a dummy transaction that matches the
>>>> narration in the import file.
>>>>
>>>> Alan
>>>>
>>>>
>>>> On Wednesday, May 12, 2021 at 12:24:55 PM UTC+1 [email protected]
>>>> wrote:
>>>>
>>>>> Hm, actually that looks ok, it has the existing_entries on the
>>>>> interface. But to be honest I'm not super familiar with how the apply hook
>>>>> is hooking this in, so there might be an issue.
>>>>>
>>>>> Maybe someone more familiar with this can respond on that.
>>>>>
>>>>>
>>>>> Otherwise if you could install smart_importer from git and then maybe
>>>>> add a bit more debug output in
>>>>>
>>>>> hooks.py and predictor.py to make sure that the existing entries
>>>>> arrive, this would give a better idea how to progress.
>>>>>
>>>>>
>>>>> On 12.05.2021 13:17, [email protected] wrote:
>>>>>
>>>>> Thank you. I think that is it.
>>>>>
>>>>> I'm using reds-importers and I see
>>>>> site-packages/beancount_reds_importers/libimport/banking.py and it has 
>>>>> this
>>>>> entry:
>>>>>
>>>>> def extract(self, file, existing_entries=None):
>>>>>
>>>>> I think this importer tool needs to be updated to support the
>>>>> smart_importer.
>>>>>
>>>>> On Wednesday, May 12, 2021 at 11:11:37 PM UTC+12 [email protected]
>>>>> wrote:
>>>>>
>>>>>> I just remembered something. The issue could be that the importer
>>>>>> you're trying to use does not have the new interface and instead still 
>>>>>> uses
>>>>>> the old (legacy) interface.
>>>>>>
>>>>>> the new one looks like this
>>>>>>
>>>>>>
>>>>>> def extract(self, file, existing_entries):
>>>>>>
>>>>>> the old one looks like this
>>>>>>
>>>>>> def extract(self, file):
>>>>>>
>>>>>>
>>>>>> Smart importer uses the existing_entries for training its model.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Patrick
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 12.05.2021 12:20, [email protected] wrote:
>>>>>>
>>>>>> Just checked and I got the same result. I can add some debugging code
>>>>>> in the config file perhaps. I'm not very experienced with beancount or
>>>>>> smart_importer so not sure what to look for.
>>>>>>
>>>>>> bean-extract -e journal/accounts.beancount jonathan_smart.import
>>>>>> ~/staging/mydata.qfx  > ~/staging/dud.txt
>>>>>>
>>>>>> gives 2 printouts of
>>>>>>
>>>>>> Cannot train the machine learning model because the training data is
>>>>>> empty.
>>>>>>
>>>>>> Cannot train the machine learning model because the training data is
>>>>>> empty.
>>>>>> On Wednesday, May 12, 2021 at 7:15:19 PM UTC+12 [email protected]
>>>>>> wrote:
>>>>>>
>>>>>>> Can you try -e instead of -f that's what I use
>>>>>>>
>>>>>>>
>>>>>>> On May 12, 2021 8:31:36 AM GMT+02:00, "[email protected]" <
>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>> Thanks for the suggestion @Patrick. I just tried changing that but
>>>>>>>> still doesn't work. I get the exact same behavior if I call it with an
>>>>>>>> empty file....seems the -f option doesn't make bean-extract behave as
>>>>>>>> expected for me. Here is my call:
>>>>>>>>
>>>>>>>> bean-extract -f journal/myledger.beancount jonathan_smart.import
>>>>>>>> ~/staging/62090_818496_1013051ofxdl.qfx  > ~/staging/dud.txt
>>>>>>>> I get these messages:
>>>>>>>>
>>>>>>>> Cannot train the machine learning model because the training data
>>>>>>>> is empty.
>>>>>>>>
>>>>>>>> Cannot train the machine learning model because the training data
>>>>>>>> is empty.
>>>>>>>>
>>>>>>>> On Wednesday, May 12, 2021 at 5:31:25 PM UTC+12 [email protected]
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I think your setup looks good, the smart importer hook is in there
>>>>>>>>> as otherwise you would not get the errors about not able to train.
>>>>>>>>>
>>>>>>>>> I think the issue is on your call
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> bean-extract jonathan_smart.import ~/staging/new_bank_data.qfx -f
>>>>>>>>> journal/myledger.beancount > ~/staging/dud.txt
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> My guess is that the -f argument needs to come before you specify
>>>>>>>>> the importconfig and the location, so
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> bean-extract -f journal/myledger.beancount jonathan_smart.import
>>>>>>>>> ~/staging/new_bank_data.qfx > ~/staging/dud.txt
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Patrick
>>>>>>>>>
>>>>>>>>> On 12.05.2021 01:58, [email protected] wrote:
>>>>>>>>>
>>>>>>>>> Thanks for looking at this module even though you aren't using it!
>>>>>>>>>
>>>>>>>>> I followed the code that was further down on the readme page
>>>>>>>>> <https://github.com/beancount/smart_importer> that describes how
>>>>>>>>> to convert an existing importer.
>>>>>>>>> >>
>>>>>>>>> from your_custom_importer import MyBankImporter
>>>>>>>>> from smart_importer import apply_hooks, PredictPayees,
>>>>>>>>> PredictPostings
>>>>>>>>>
>>>>>>>>> my_bank_importer = MyBankImporter('whatever', 'config', 'is',
>>>>>>>>> 'needed')
>>>>>>>>> apply_hooks(my_bank_importer, [PredictPostings(),
>>>>>>>>> PredictPayees()])
>>>>>>>>> CONFIG = [ my_bank_importer, ]
>>>>>>>>> >>
>>>>>>>>> (my code looks just like this example)
>>>>>>>>>
>>>>>>>>> I had thought apply_hooks would operate on the importer so when I
>>>>>>>>> call it in config I can just then call the hookified bank_importer. 
>>>>>>>>> Is this
>>>>>>>>> note the case?
>>>>>>>>>
>>>>>>>>> On Wednesday, May 12, 2021 at 1:26:27 AM UTC+12 [email protected]
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> * Disclaimer * I have never actually run smart importer.
>>>>>>>>>>
>>>>>>>>>> Looking at the README on GitHub for smart importer it looks like
>>>>>>>>>> you need to use the return object of apply_hooks in your CONFIG list.
>>>>>>>>>>
>>>>>>>>>> CONFIG = [
>>>>>>>>>> apply_hooks(MyBankImporter(account='Assets:MyBank:MyAccount'),
>>>>>>>>>> [PredictPostings()]) ]
>>>>>>>>>>
>>>>>>>>>> In your config you apply the hooks but are not using the returned
>>>>>>>>>> object.
>>>>>>>>>>
>>>>>>>>>> Hope that helps.
>>>>>>>>>>
>>>>>>>>>> On Tuesday, 11 May 2021 at 04:06:33 UTC+1 [email protected]
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I'm trying to get smart_importer to work and not sure what I'm
>>>>>>>>>>> doing wrong.
>>>>>>>>>>>
>>>>>>>>>>> *1*. I successfully have done all the required beancount setup
>>>>>>>>>>> and created by own bank importer and ran it on two months of data.
>>>>>>>>>>> *2.* I then manually labelled about 2 months of data from one
>>>>>>>>>>> of my banks.
>>>>>>>>>>> *3.* I installed smart_importer using "pip install
>>>>>>>>>>> smart_importer"
>>>>>>>>>>>
>>>>>>>>>>> (base) MacBook-Air:beandata jonathan$ pip show smart_importer
>>>>>>>>>>>
>>>>>>>>>>> Name: smart-importer
>>>>>>>>>>>
>>>>>>>>>>> Version: 0.3
>>>>>>>>>>>
>>>>>>>>>>> Summary: Augment Beancount importers with machine learning
>>>>>>>>>>> functionality.
>>>>>>>>>>>
>>>>>>>>>>> Home-page: https://github.com/beancount/smart_importer
>>>>>>>>>>>
>>>>>>>>>>> Author: Johannes Harms
>>>>>>>>>>>
>>>>>>>>>>> Author-email: UNKNOWN
>>>>>>>>>>>
>>>>>>>>>>> License: MIT
>>>>>>>>>>>
>>>>>>>>>>> Location:
>>>>>>>>>>> /Users/jonathan/opt/miniconda3/lib/python3.8/site-packages
>>>>>>>>>>>
>>>>>>>>>>> Requires: scikit-learn, beancount, numpy, scipy
>>>>>>>>>>>
>>>>>>>>>>> *4.* I created a new config file I called Jonathan_smart.import
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> base) MacBook-Air:beandata jonathan$ more jonathan_smart.import
>>>>>>>>>>>
>>>>>>>>>>> #!/usr/bin/env python3
>>>>>>>>>>>
>>>>>>>>>>> """Import configuration."""
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> import sys
>>>>>>>>>>>
>>>>>>>>>>> from os import path
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> sys.path.insert(0, path.join(path.dirname(__file__)))
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> from beancount_reds_importers import vanguard
>>>>>>>>>>>
>>>>>>>>>>> from myimporters.bfsfcu import bfsfcu_bank
>>>>>>>>>>>
>>>>>>>>>>> from myimporters.anz import anz_bank
>>>>>>>>>>>
>>>>>>>>>>> from fund_info import *
>>>>>>>>>>>
>>>>>>>>>>> from smart_importer import apply_hooks, PredictPayees,
>>>>>>>>>>> PredictPostings
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> myBank_smart_importer =my_bank.Importer({
>>>>>>>>>>>
>>>>>>>>>>>         'main_account'   : 'Assets:US:Banks:Checking:myBank',
>>>>>>>>>>>
>>>>>>>>>>>         'account_number' : ''xxx'',
>>>>>>>>>>>
>>>>>>>>>>>         'transfer'       :
>>>>>>>>>>> 'Assets:US:Zero-Sum-Accounts:Transfers:Bank-Account',
>>>>>>>>>>>
>>>>>>>>>>>         'income'         : 'Income:US:Interest:myBank',
>>>>>>>>>>>
>>>>>>>>>>>         'fees'           : 'Expenses:US:Bank-Fees:myBank',
>>>>>>>>>>>
>>>>>>>>>>>         'rounding_error' : 'Equity:US:Rounding-Errors:Imports',
>>>>>>>>>>>
>>>>>>>>>>>     })
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> apply_hooks(myBank_smart_importer, [PredictPayees(),
>>>>>>>>>>> PredictPostings()])
>>>>>>>>>>>
>>>>>>>>>>> CONFIG = [myBank_smart_importer, ...(other importers)]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *5*. I was following the README documentation that said write
>>>>>>>>>>> bean-extract -f to invoke it on existing data. So I tried the 
>>>>>>>>>>> following.*
>>>>>>>>>>> Is this right?*
>>>>>>>>>>>
>>>>>>>>>>> bean-extract jonathan_smart.import ~/staging/new_bank_data.qfx
>>>>>>>>>>> -f journal/myledger.beancount > ~/staging/dud.txt
>>>>>>>>>>>
>>>>>>>>>>> Cannot train the machine learning model because the training
>>>>>>>>>>> data is empty.
>>>>>>>>>>>
>>>>>>>>>>> Cannot train the machine learning model because the training
>>>>>>>>>>> data is empty.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The output is just like the normal output without all the
>>>>>>>>>>> smart_importer stuff.  Seems I'm doing something wrong as the
>>>>>>>>>>> staging/dud.txt doesn't have any predictions.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Appreciate any assistance on this!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> thanks,
>>>>>>>>>>>
>>>>>>>>>>> Jonathan
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "Beancount" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to [email protected].
>>>>>>>>> To view this discussion on the web visit
>>>>>>>>> https://groups.google.com/d/msgid/beancount/820ef641-8178-47d1-9e97-afbc709e6a83n%40googlegroups.com
>>>>>>>>> <https://groups.google.com/d/msgid/beancount/820ef641-8178-47d1-9e97-afbc709e6a83n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Beancount" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>>
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/beancount/fe28577c-8220-49cd-b976-40ef9f0b6a91n%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/beancount/fe28577c-8220-49cd-b976-40ef9f0b6a91n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Beancount" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>>
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/beancount/6248ca60-16fa-4ad0-88b5-1c4bb91f9feen%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/beancount/6248ca60-16fa-4ad0-88b5-1c4bb91f9feen%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Beancount" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>>
>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/beancount/2b800e6d-fb0c-4b78-bde3-477eee6f9e7en%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/beancount/2b800e6d-fb0c-4b78-bde3-477eee6f9e7en%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "Beancount" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/beancount/f1e3ce25-e842-45b4-bb28-4f3737a3cb9en%40googlegroups.com
>> <https://groups.google.com/d/msgid/beancount/f1e3ce25-e842-45b4-bb28-4f3737a3cb9en%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/beancount/CANUAcYdz12pG%2BPyxiBdn5-L14TtSztkJ8A%2BQ8Fwfd753vN0-tg%40mail.gmail.com
> <https://groups.google.com/d/msgid/beancount/CANUAcYdz12pG%2BPyxiBdn5-L14TtSztkJ8A%2BQ8Fwfd753vN0-tg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> --
> You received this message because you are subscribed to the Google Groups
> "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/beancount/858c5ceb-7507-5f9c-793a-4dd5a4bd44e2%40ch.tario.org
> <https://groups.google.com/d/msgid/beancount/858c5ceb-7507-5f9c-793a-4dd5a4bd44e2%40ch.tario.org?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/CANUAcYdNeEw9UjFsZzq3RmcusEVkjZS_XzS1h1PPA2JUPp9Sjw%40mail.gmail.com.

Re: smart importer newbie question

Reply via email to