Re: smart importer newbie question

'Patrick Ruckstuhl' via Beancount Thu, 20 May 2021 05:34:50 -0700

So if I see this correctly, after the filtering of the training data,there is never any data left.


The logic looks like this


    def training_data_filter(self, txn):
        """Filter function for the training data."""
        found_import_account = False
        for pos in txn.postings:
            if pos.account not in self.open_accounts:
                return False
            if self.account == pos.account:
                found_import_account = True
        return found_import_account or not self.account

And from the printout you have something in self.account. So if I seethis correctly, either none of your training data is matching theaccount or the account is actually no longer open.

Maybe worth printing out the self.open_accounts and maybe evendebugging/logging some stuff in that training_data_filter code



Regards,

Patrick


On 20.05.2021 02:02, Jonathan Goldman wrote:

Hi Patrick,

Thanks for the suggestions. I started doing this. Here is what I'm seeing:

------CHECKPOINT1-------

1353

1133

0

------CHECKPOINT2-------

[]

---__call__----

Assets:US:Banks:Checking:myBank

------CHECKPOINT1-------

1353

1133

0

------CHECKPOINT2-------

[]

---__call__----

Assets:US:Banks:Checking:myBank


Here is the code I added to predictory.py:

#beg

print('---__call__----')

print(self.account)

#print(existing_entries)

#end

withself.lock:

self.define_pipeline()

self.train_pipeline()

returnself.process_entries(imported_entries)


defload_open_accounts(self, existing_entries):

"""Return map of accounts which have been opened but not closed."""

account_map= {}

ifnotexisting_entries:

return


forentry inbeancount_sorted(existing_entries):

# pylint: disable=isinstance-second-argument-not-valid-type

ifisinstance(entry, Open):

account_map[entry.account] = entry

elifisinstance(entry, Close):

            account_map.pop(entry.account)


self.open_accounts = account_map


defload_training_data(self, existing_entries):

"""Load training data, i.e., a list of Beancount entries."""

training_data= existing_entries or[]

self.load_open_accounts(existing_entries)

#beg1

print('------CHECKPOINT1-------')

print(len(training_data))

#end1

training_data= list(filter_txns(training_data))

print(len(training_data))

length_all= len(training_data)

training_data= [

        txn fortxn intraining_data ifself.training_data_filter(txn)

]

print(len(training_data))

#beg2

print('------CHECKPOINT2-------')

print(training_data)

#beg2


--------

I'm trying to check now that every account in the config file ispresent in my beancount file. I noticed one missing and that changedwhat was in the training_data but still getting the warning abouttraining data being empty. I'll keep digging as best I can butdefinitely can use any additional help.

On Wed, May 19, 2021 at 3:16 AM 'Patrick Ruckstuhl' via Beancount<[email protected] <mailto:[email protected]>> wrote:


    Hi Jonathan,


    Let's try to figure this out. In smart importer can you printout
    the following stuff


    in smart_importer/predictor.py


    in __call__ around line 64

    print(self.account)

    print(existing_entries)


    in load_training_data around line 91

    print(training_data)

    and around line 95

    print(training_data)


    That should give an idea where the information is "lost".
    Depending on where the information is lost, you can then dig a bit
    deeper into what is happening.


    Regards,

    Patrick





    On 18.05.2021 13:14, Jonathan Goldman wrote:

    Thanks Red.

    bean-query works fine on my input file which now has >1000
    transactions .

    Ready with 1344 directives (2266 postings in 1133 transactions).

    beancount>

    I still get the error. I'm not sure what is causing and not sure
    how to debug it. The only other issue I recall seeing was some
    error with fund_info or something in getting prices but I thought
    it was an unrelated issue.

    Do you or does anyone have some suggestions on where/how to
    debug. E.g. I should print some variables to STDOUT at such and
    such point inside smart_importer code or inside bean-extract.

    thanks,
    Jonathan



    On Mon, May 17, 2021 at 9:34 PM [email protected]
    <mailto:[email protected]> <[email protected]
    <mailto:[email protected]>> wrote:

        A minimum of two transactions should suffice for
        smart_importer. More will increase prediction quality, but
        two should suffice. I can't tell what's happening at your
        end, but you're likely ending up with zero transactions for
        some reason. Run bean-query on the file you pass to "-f" of
        bean-extract.

        beancount-reds-importers supports smart_importer out of the
        box for banking, that shouldn't be an issue AFAICT.



        On Wednesday, May 12, 2021 at 10:23:14 PM UTC-7
        [email protected] <mailto:[email protected]> wrote:

            Thanks for suggestions @Patrick and Alan. My beancount
            file has about 64 Asset accounts. It has about 41 expense
            accounts. I have only 2 months of labelled banking
            transactions (about 42 transactions) all associated with
            one bank account and various expense accounts.

            I had thought that some transactions were relatively
            deterministic (same $ amount and same description like
            rent/mortgage) and I was under the impression that only a
            few months of data are needed to get going.

            Perhaps I'll just go back to manually labelling data for
            now and trying again later or after I see more
            posts/explanation of smart_importer. I'm not well-versed
            enough with smart_importer to debug what is happening.

            On Thu, May 13, 2021 at 3:04 AM Alan H
            <[email protected]> wrote:

                I get this error when there are insufficient entries
                in the journal to teach the smart_importer how to
                file new transactions. Specifically there are no
                matches for payees or narrations.

                Is that the case? Try adding a dummy transaction that
                matches the narration in the import file.

                Alan


                On Wednesday, May 12, 2021 at 12:24:55 PM UTC+1
                [email protected] wrote:

                    Hm, actually that looks ok, it has the
                    existing_entries on the interface. But to be
                    honest I'm not super familiar with how the apply
                    hook is hooking this in, so there might be an issue.

                    Maybe someone more familiar with this can respond
                    on that.


                    Otherwise if you could install smart_importer
                    from git and then maybe add a bit more debug
                    output in

                    hooks.py and predictor.py to make sure that the
                    existing entries arrive, this would give a better
                    idea how to progress.


                    On 12.05.2021 13:17, [email protected] wrote:

                    Thank you. I think that is it.

                    I'm using reds-importers and I see
                    site-packages/beancount_reds_importers/libimport/banking.py
                    and it has this entry:

                    def extract(self, file, existing_entries=None):

                    I think this importer tool needs to be updated
                    to support the smart_importer.

                    On Wednesday, May 12, 2021 at 11:11:37 PM UTC+12
                    [email protected] wrote:

                        I just remembered something. The issue could
                        be that the importer you're trying to use
                        does not have the new interface and instead
                        still uses the old (legacy) interface.

                        the new one looks like this


                        def extract(self, file, existing_entries):

                        the old one looks like this

                        def extract(self, file):


                        Smart importer uses the existing_entries for
                        training its model.


                        Regards,

                        Patrick




                        On 12.05.2021 12:20, [email protected] wrote:

                        Just checked and I got the same result. I
                        can add some debugging code in the config
                        file perhaps. I'm not very experienced with
                        beancount or smart_importer so not sure
                        what to look for.

                        bean-extract -e journal/accounts.beancount
                        jonathan_smart.import ~/staging/mydata.qfx 
                        > ~/staging/dud.txt

                        gives 2 printouts of

                        Cannot train the machine learning model
                        because the training data is empty.

                        Cannot train the machine learning model
                        because the training data is empty.

                        On Wednesday, May 12, 2021 at 7:15:19 PM
                        UTC+12 [email protected] wrote:

                            Can you try -e instead of -f that's
                            what I use


                            On May 12, 2021 8:31:36 AM GMT+02:00,
                            "[email protected]" <[email protected]>
                            wrote:

                                Thanks for the suggestion @Patrick.
                                I just tried changing that but
                                still doesn't work. I get the exact
                                same behavior if I call it with an
                                empty file....seems the -f option
                                doesn't make bean-extract behave as
                                expected for me. Here is my call:

                                bean-extract -f
                                journal/myledger.beancount
                                jonathan_smart.import
                                ~/staging/62090_818496_1013051ofxdl.qfx
                                > ~/staging/dud.txt

                                I get these messages:

                                Cannot train the machine learning
                                model because the training data is
                                empty.

                                Cannot train the machine learning
                                model because the training data is
                                empty.


                                On Wednesday, May 12, 2021 at
                                5:31:25 PM UTC+12
                                [email protected] wrote:

                                    Hi,

                                    I think your setup looks good,
                                    the smart importer hook is in
                                    there as otherwise you would
                                    not get the errors about not
                                    able to train.

                                    I think the issue is on your call


                                    bean-extract
                                    jonathan_smart.import
                                    ~/staging/new_bank_data.qfx -f
                                    journal/myledger.beancount >
                                    ~/staging/dud.txt


                                    My guess is that the -f
                                    argument needs to come before
                                    you specify the importconfig
                                    and the location, so


                                    bean-extract -f
                                    journal/myledger.beancount
                                    jonathan_smart.import
                                    ~/staging/new_bank_data.qfx >
                                    ~/staging/dud.txt


                                    Regards,

                                    Patrick


                                    On 12.05.2021 01:58,
                                    [email protected] wrote:

                                    Thanks for looking at this
                                    module even though you aren't
                                    using it!

                                    I followed the code that was
                                    further down on the readme
                                    page
                                    
<https://github.com/beancount/smart_importer>
                                    that describes how to convert
                                    an existing importer.
                                    >>
                                    from your_custom_importer
                                    import MyBankImporter
                                    from smart_importer import
                                    apply_hooks, PredictPayees,
                                    PredictPostings

                                    my_bank_importer =
                                    MyBankImporter('whatever',
                                    'config', 'is', 'needed')
                                    apply_hooks(my_bank_importer,
                                    [PredictPostings(),
                                    PredictPayees()])
                                    CONFIG = [ my_bank_importer, ]
                                    >>
                                    (my code looks just like this
                                    example)

                                    I had thought apply_hooks
                                    would operate on the importer
                                    so when I call it in config I
                                    can just then call the
                                    hookified bank_importer. Is
                                    this note the case?

                                    On Wednesday, May 12, 2021 at
                                    1:26:27 AM UTC+12
                                    [email protected] wrote:

                                        * Disclaimer * I have
                                        never actually run smart
                                        importer.

                                        Looking at the README on
                                        GitHub for smart importer
                                        it looks like you need to
                                        use the return object of
                                        apply_hooks in your CONFIG
                                        list.

                                        CONFIG = [
                                        
apply_hooks(MyBankImporter(account='Assets:MyBank:MyAccount'),
                                        [PredictPostings()]) ]

                                        In your config you apply
                                        the hooks but are not
                                        using the returned object.

                                        Hope that helps.

                                        On Tuesday, 11 May 2021 at
                                        04:06:33 UTC+1
                                        [email protected] wrote:

                                            Hi,

                                            I'm trying to get
                                            smart_importer to work
                                            and not sure what I'm
                                            doing wrong.

                                            *_1_*. I successfully
                                            have done all the
                                            required beancount
                                            setup and created by
                                            own bank importer and
                                            ran it on two months
                                            of data.
                                            _*2.*_ I then manually
                                            labelled about 2
                                            months of data from
                                            one of my banks.
                                            *_3._* I installed
                                            smart_importer using
                                            "pip install
                                            smart_importer"

                                            (base)
                                            MacBook-Air:beandata
                                            jonathan$ pip show
                                            smart_importer

                                            Name: smart-importer

                                            Version: 0.3

                                            Summary: Augment
                                            Beancount importers
                                            with machine learning
                                            functionality.

                                            Home-page:
                                            
https://github.com/beancount/smart_importer
                                            
<https://github.com/beancount/smart_importer>

                                            Author: Johannes Harms

                                            Author-email: UNKNOWN

                                            License: MIT

                                            Location:
                                            
/Users/jonathan/opt/miniconda3/lib/python3.8/site-packages

                                            Requires:
                                            scikit-learn,
                                            beancount, numpy, scipy

                                            *_4._* I created a new
                                            config file I called
                                            Jonathan_smart.import


                                            base)
                                            MacBook-Air:beandata
                                            jonathan$ more
                                            jonathan_smart.import

                                            #!/usr/bin/env python3

                                            """Import
                                            configuration."""


                                            import sys

                                            from os import path


                                            sys.path.insert(0,
                                            path.join(path.dirname(__file__)))


                                            from
                                            beancount_reds_importers
                                            import vanguard

                                            from
                                            myimporters.bfsfcu
                                            import bfsfcu_bank

                                            from myimporters.anz
                                            import anz_bank

                                            from fund_info import *

                                            from smart_importer
                                            import apply_hooks,
                                            PredictPayees,
                                            PredictPostings


                                            myBank_smart_importer
                                            =my_bank.Importer({

                                            'main_account'   :
                                            'Assets:US:Banks:Checking:myBank',

                                            'account_number' :
                                            ''xxx'',

                                            'transfer'   :
                                            
'Assets:US:Zero-Sum-Accounts:Transfers:Bank-Account',

                                            'income'   :
                                            'Income:US:Interest:myBank',

                                            'fees'   :
                                            'Expenses:US:Bank-Fees:myBank',

                                            'rounding_error' :
                                            'Equity:US:Rounding-Errors:Imports',

                                                })


                                            apply_hooks(myBank_smart_importer,
                                            [PredictPayees(),
                                            PredictPostings()])

                                            CONFIG =
                                            [myBank_smart_importer,
                                            ...(other importers)]


                                            *_5_*. I was following
                                            the README
                                            documentation that
                                            said write
                                            bean-extract -f to
                                            invoke it on existing
                                            data. So I tried the
                                            following.*Is this right?*

                                            bean-extract
                                            jonathan_smart.import
                                            ~/staging/new_bank_data.qfx
                                            -f
                                            journal/myledger.beancount
                                            > ~/staging/dud.txt

                                            Cannot train the
                                            machine learning model
                                            because the training
                                            data is empty.

                                            Cannot train the
                                            machine learning model
                                            because the training
                                            data is empty.


                                            The output is just
                                            like the normal output
                                            without all the
                                            smart_importer stuff. 
                                            Seems I'm doing
                                            something wrong as the
                                            staging/dud.txt
                                            doesn't have any
                                            predictions.


                                            Appreciate any
                                            assistance on this!


                                            thanks,

                                            Jonathan

--You received this message

                                    because you are subscribed to
                                    the Google Groups "Beancount"
                                    group.
                                    To unsubscribe from this group
                                    and stop receiving emails from
                                    it, send an email to
                                    [email protected].
                                    To view this discussion on the
                                    web visit
                                    
https://groups.google.com/d/msgid/beancount/820ef641-8178-47d1-9e97-afbc709e6a83n%40googlegroups.com
                                    
<https://groups.google.com/d/msgid/beancount/820ef641-8178-47d1-9e97-afbc709e6a83n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--You received this message because you are

                        subscribed to the Google Groups "Beancount"
                        group.
                        To unsubscribe from this group and stop
                        receiving emails from it, send an email to
                        [email protected].
                        To view this discussion on the web visit
                        
https://groups.google.com/d/msgid/beancount/fe28577c-8220-49cd-b976-40ef9f0b6a91n%40googlegroups.com
                        
<https://groups.google.com/d/msgid/beancount/fe28577c-8220-49cd-b976-40ef9f0b6a91n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--You received this message because you are

                    subscribed to the Google Groups "Beancount" group.
                    To unsubscribe from this group and stop
                    receiving emails from it, send an email to
                    [email protected].
                    To view this discussion on the web visit
                    
https://groups.google.com/d/msgid/beancount/6248ca60-16fa-4ad0-88b5-1c4bb91f9feen%40googlegroups.com
                    
<https://groups.google.com/d/msgid/beancount/6248ca60-16fa-4ad0-88b5-1c4bb91f9feen%40googlegroups.com?utm_medium=email&utm_source=footer>.

--You received this message because you are subscribed

                to the Google Groups "Beancount" group.
                To unsubscribe from this group and stop receiving
                emails from it, send an email to
                [email protected].

                To view this discussion on the web visit
                
https://groups.google.com/d/msgid/beancount/2b800e6d-fb0c-4b78-bde3-477eee6f9e7en%40googlegroups.com
                
<https://groups.google.com/d/msgid/beancount/2b800e6d-fb0c-4b78-bde3-477eee6f9e7en%40googlegroups.com?utm_medium=email&utm_source=footer>.

--You received this message because you are subscribed to the

        Google Groups "Beancount" group.
        To unsubscribe from this group and stop receiving emails from
        it, send an email to [email protected]
        <mailto:[email protected]>.
        To view this discussion on the web visit
        
https://groups.google.com/d/msgid/beancount/f1e3ce25-e842-45b4-bb28-4f3737a3cb9en%40googlegroups.com
        
<https://groups.google.com/d/msgid/beancount/f1e3ce25-e842-45b4-bb28-4f3737a3cb9en%40googlegroups.com?utm_medium=email&utm_source=footer>.

--You received this message because you are subscribed to the

    Google Groups "Beancount" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected]
    <mailto:[email protected]>.
    To view this discussion on the web visit
    
https://groups.google.com/d/msgid/beancount/CANUAcYdz12pG%2BPyxiBdn5-L14TtSztkJ8A%2BQ8Fwfd753vN0-tg%40mail.gmail.com
    
<https://groups.google.com/d/msgid/beancount/CANUAcYdz12pG%2BPyxiBdn5-L14TtSztkJ8A%2BQ8Fwfd753vN0-tg%40mail.gmail.com?utm_medium=email&utm_source=footer>.

--You received this message because you are subscribed to the Google

    Groups "Beancount" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected]
    <mailto:[email protected]>.
    To view this discussion on the web visit
    
https://groups.google.com/d/msgid/beancount/858c5ceb-7507-5f9c-793a-4dd5a4bd44e2%40ch.tario.org
    
<https://groups.google.com/d/msgid/beancount/858c5ceb-7507-5f9c-793a-4dd5a4bd44e2%40ch.tario.org?utm_medium=email&utm_source=footer>.

--

You received this message because you are subscribed to the GoogleGroups "Beancount" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected]<mailto:[email protected]>.To view this discussion on the web visithttps://groups.google.com/d/msgid/beancount/CANUAcYdNeEw9UjFsZzq3RmcusEVkjZS_XzS1h1PPA2JUPp9Sjw%40mail.gmail.com<https://groups.google.com/d/msgid/beancount/CANUAcYdNeEw9UjFsZzq3RmcusEVkjZS_XzS1h1PPA2JUPp9Sjw%40mail.gmail.com?utm_medium=email&utm_source=footer>.


--
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/3ff79e07-83d4-3895-452f-42b287bc2ca4%40ch.tario.org.

Re: smart importer newbie question

Reply via email to