Created https://github.com/beancount/smart_importer/pull/109 to improve this a little bit

On 21.05.2021 15:26, kuba jamro wrote:
That's great news.

In my opinion I think the error message could be more helpful. If you
have time, it would be useful to raise an issue on GitHub requesting
an improved message for this case.

Jakub.

On Fri, 21 May 2021 at 06:07, Jonathan Goldman <[email protected]> wrote:
Hi Patrick and everyone,

I resolved the issue. It’s working well. The pointers to where to add print 
statements was very helpful. The problem was the account name for the existing 
transactions was not correct and I fixed it and now it is able to train and 
predict.

Thanks again.
Jonathan

On May 21, 2021, at 6:24 AM, 'Patrick Ruckstuhl' via Beancount 
<[email protected]> wrote:

Probably the easiest examples are for the data driven tests you can find here

https://github.com/beancount/smart_importer/tree/master/tests/data


The simples of them probably

https://github.com/beancount/smart_importer/blob/master/tests/data/multiaccounts.beancount



On 20.05.2021 19:03, Hawrylyshen, Alan wrote:

MIght it be simpler to (sorry for suggesting the obvious) try a toy example 
data set to get things up and working?
I didn't take too much effort to get the smart_importer wrapping my 
importers... so I imagine this is something relatively simple.
Ideally there'd be a test case in the smart_importer repository already?

Thanks
Alan

On Thu, 20 May 2021 at 13:34, 'Patrick Ruckstuhl' via Beancount 
<[email protected]> wrote:
So if I see this correctly, after the filtering of the training data, there is 
never any data left.

The logic looks like this

     def training_data_filter(self, txn):
         """Filter function for the training data."""
         found_import_account = False
         for pos in txn.postings:
             if pos.account not in self.open_accounts:
                 return False
             if self.account == pos.account:
                 found_import_account = True
         return found_import_account or not self.account


And from the printout you have something in self.account. So if I see this 
correctly, either none of your training data is matching the account or the 
account is actually no longer open.

Maybe worth printing out the self.open_accounts and maybe even 
debugging/logging some stuff in that training_data_filter code


Regards,

Patrick


On 20.05.2021 02:02, Jonathan Goldman wrote:

Hi Patrick,

Thanks for the suggestions. I started doing this. Here is what I'm seeing:

------CHECKPOINT1-------
1353
1133
0
------CHECKPOINT2-------
[]
---__call__----
Assets:US:Banks:Checking:myBank
------CHECKPOINT1-------
1353
1133
0
------CHECKPOINT2-------
[]
---__call__----
Assets:US:Banks:Checking:myBank

Here is the code I added to predictory.py:
#beg
         print('---__call__----')
         print(self.account)
         #print(existing_entries)
#end
         with self.lock:
             self.define_pipeline()
             self.train_pipeline()
             return self.process_entries(imported_entries)

     def load_open_accounts(self, existing_entries):
         """Return map of accounts which have been opened but not closed."""
         account_map = {}
         if not existing_entries:
             return

         for entry in beancount_sorted(existing_entries):
             # pylint: disable=isinstance-second-argument-not-valid-type
             if isinstance(entry, Open):
                 account_map[entry.account] = entry
             elif isinstance(entry, Close):
                 account_map.pop(entry.account)

         self.open_accounts = account_map

     def load_training_data(self, existing_entries):
         """Load training data, i.e., a list of Beancount entries."""
training_data = existing_entries or []
         self.load_open_accounts(existing_entries)
#beg1
         print('------CHECKPOINT1-------')
         print(len(training_data))
#end1
         training_data = list(filter_txns(training_data))
         print(len(training_data))
length_all = len(training_data)
         training_data = [
             txn for txn in training_data if self.training_data_filter(txn)
]
         print(len(training_data))
#beg2
         print('------CHECKPOINT2-------')
         print(training_data)
#beg2

--------
I'm trying to check now that every account in the config file is present in my 
beancount file. I noticed one missing and that changed what was in the 
training_data but still getting the warning about training data being empty. 
I'll keep digging as best I can but definitely can use any additional help.

On Wed, May 19, 2021 at 3:16 AM 'Patrick Ruckstuhl' via Beancount 
<[email protected]> wrote:
Hi Jonathan,


Let's try to figure this out. In smart importer can you printout the following 
stuff


in smart_importer/predictor.py


in __call__ around line 64

print(self.account)

print(existing_entries)


in load_training_data around line 91

print(training_data)

and around line 95

print(training_data)


That should give an idea where the information is "lost". Depending on where 
the information is lost, you can then dig a bit deeper into what is happening.


Regards,

Patrick





On 18.05.2021 13:14, Jonathan Goldman wrote:

Thanks Red.

bean-query works fine on my input file which now has >1000 transactions .

Ready with 1344 directives (2266 postings in 1133 transactions).
beancount>

I still get the error. I'm not sure what is causing and not sure how to debug 
it. The only other issue I recall seeing was some error with fund_info or 
something in getting prices but I thought it was an unrelated issue.

Do you or does anyone have some suggestions on where/how to debug. E.g. I 
should print some variables to STDOUT at such and such point inside 
smart_importer code or inside bean-extract.

thanks,
Jonathan



On Mon, May 17, 2021 at 9:34 PM [email protected] <[email protected]> wrote:
A minimum of two transactions should suffice for smart_importer. More will increase 
prediction quality, but two should suffice. I can't tell what's happening at your end, 
but you're likely ending up with zero transactions for some reason. Run bean-query on the 
file you pass to "-f" of bean-extract.

beancount-reds-importers supports smart_importer out of the box for banking, 
that shouldn't be an issue AFAICT.



On Wednesday, May 12, 2021 at 10:23:14 PM UTC-7 [email protected] wrote:
Thanks for suggestions @Patrick and Alan. My beancount file has about 64 Asset 
accounts. It has about 41 expense accounts. I have only 2 months of labelled 
banking transactions (about 42 transactions) all associated with one bank 
account and various expense accounts.

I had thought that some transactions were relatively deterministic (same $ 
amount and same description like rent/mortgage) and I was under the impression 
that only a few months of data are needed to get going.

Perhaps I'll just go back to manually labelling data for now and trying again 
later or after I see more posts/explanation of smart_importer. I'm not 
well-versed enough with smart_importer to debug what is happening.

On Thu, May 13, 2021 at 3:04 AM Alan H <[email protected]> wrote:
I get this error when there are insufficient entries in the journal to teach 
the smart_importer how to file new transactions. Specifically there are no 
matches for payees or narrations.

Is that the case? Try adding a dummy transaction that matches the narration in 
the import file.

Alan


On Wednesday, May 12, 2021 at 12:24:55 PM UTC+1 [email protected] wrote:
Hm, actually that looks ok, it has the existing_entries on the interface. But 
to be honest I'm not super familiar with how the apply hook is hooking this in, 
so there might be an issue.

Maybe someone more familiar with this can respond on that.


Otherwise if you could install smart_importer from git and then maybe add a bit 
more debug output in

hooks.py and predictor.py to make sure that the existing entries arrive, this 
would give a better idea how to progress.


On 12.05.2021 13:17, [email protected] wrote:

Thank you. I think that is it.

I'm using reds-importers and I see 
site-packages/beancount_reds_importers/libimport/banking.py and it has this 
entry:

def extract(self, file, existing_entries=None):

I think this importer tool needs to be updated to support the smart_importer.

On Wednesday, May 12, 2021 at 11:11:37 PM UTC+12 [email protected] wrote:
I just remembered something. The issue could be that the importer you're trying 
to use does not have the new interface and instead still uses the old (legacy) 
interface.

the new one looks like this


def extract(self, file, existing_entries):

the old one looks like this

def extract(self, file):


Smart importer uses the existing_entries for training its model.


Regards,

Patrick




On 12.05.2021 12:20, [email protected] wrote:

Just checked and I got the same result. I can add some debugging code in the 
config file perhaps. I'm not very experienced with beancount or smart_importer 
so not sure what to look for.

bean-extract -e journal/accounts.beancount jonathan_smart.import 
~/staging/mydata.qfx  > ~/staging/dud.txt

gives 2 printouts of

Cannot train the machine learning model because the training data is empty.

Cannot train the machine learning model because the training data is empty.

On Wednesday, May 12, 2021 at 7:15:19 PM UTC+12 [email protected] wrote:
Can you try -e instead of -f that's what I use


On May 12, 2021 8:31:36 AM GMT+02:00, "[email protected]" <[email protected]> 
wrote:
Thanks for the suggestion @Patrick. I just tried changing that but still 
doesn't work. I get the exact same behavior if I call it with an empty 
file....seems the -f option doesn't make bean-extract behave as expected for 
me. Here is my call:

bean-extract -f journal/myledger.beancount jonathan_smart.import 
~/staging/62090_818496_1013051ofxdl.qfx  > ~/staging/dud.txt

I get these messages:

Cannot train the machine learning model because the training data is empty.

Cannot train the machine learning model because the training data is empty.


On Wednesday, May 12, 2021 at 5:31:25 PM UTC+12 [email protected] wrote:
Hi,

I think your setup looks good, the smart importer hook is in there as otherwise 
you would not get the errors about not able to train.

I think the issue is on your call


bean-extract jonathan_smart.import ~/staging/new_bank_data.qfx -f 
journal/myledger.beancount > ~/staging/dud.txt


My guess is that the -f argument needs to come before you specify the 
importconfig and the location, so


bean-extract -f journal/myledger.beancount jonathan_smart.import 
~/staging/new_bank_data.qfx > ~/staging/dud.txt


Regards,

Patrick


On 12.05.2021 01:58, [email protected] wrote:

Thanks for looking at this module even though you aren't using it!

I followed the code that was further down on the readme page that describes how 
to convert an existing importer.
from your_custom_importer import MyBankImporter
from smart_importer import apply_hooks, PredictPayees, PredictPostings

my_bank_importer = MyBankImporter('whatever', 'config', 'is', 'needed')
apply_hooks(my_bank_importer, [PredictPostings(), PredictPayees()])
CONFIG = [ my_bank_importer, ]
(my code looks just like this example)

I had thought apply_hooks would operate on the importer so when I call it in 
config I can just then call the hookified bank_importer. Is this note the case?

On Wednesday, May 12, 2021 at 1:26:27 AM UTC+12 [email protected] wrote:
* Disclaimer * I have never actually run smart importer.

Looking at the README on GitHub for smart importer it looks like you need to 
use the return object of apply_hooks in your CONFIG list.

CONFIG = [ apply_hooks(MyBankImporter(account='Assets:MyBank:MyAccount'), 
[PredictPostings()]) ]

In your config you apply the hooks but are not using the returned object.

Hope that helps.

On Tuesday, 11 May 2021 at 04:06:33 UTC+1 [email protected] wrote:
Hi,

I'm trying to get smart_importer to work and not sure what I'm doing wrong.

1. I successfully have done all the required beancount setup and created by own 
bank importer and ran it on two months of data.
2. I then manually labelled about 2 months of data from one of my banks.
3. I installed smart_importer using "pip install smart_importer"

(base) MacBook-Air:beandata jonathan$ pip show smart_importer

Name: smart-importer

Version: 0.3

Summary: Augment Beancount importers with machine learning functionality.

Home-page: https://github.com/beancount/smart_importer

Author: Johannes Harms

Author-email: UNKNOWN

License: MIT

Location: /Users/jonathan/opt/miniconda3/lib/python3.8/site-packages

Requires: scikit-learn, beancount, numpy, scipy

4. I created a new config file I called Jonathan_smart.import


base) MacBook-Air:beandata jonathan$ more jonathan_smart.import

#!/usr/bin/env python3

"""Import configuration."""


import sys

from os import path


sys.path.insert(0, path.join(path.dirname(__file__)))


from beancount_reds_importers import vanguard

from myimporters.bfsfcu import bfsfcu_bank

from myimporters.anz import anz_bank

from fund_info import *

from smart_importer import apply_hooks, PredictPayees, PredictPostings


myBank_smart_importer =my_bank.Importer({

         'main_account'   : 'Assets:US:Banks:Checking:myBank',

         'account_number' : ''xxx'',

         'transfer'       : 
'Assets:US:Zero-Sum-Accounts:Transfers:Bank-Account',

         'income'         : 'Income:US:Interest:myBank',

         'fees'           : 'Expenses:US:Bank-Fees:myBank',

         'rounding_error' : 'Equity:US:Rounding-Errors:Imports',

     })


apply_hooks(myBank_smart_importer, [PredictPayees(), PredictPostings()])

CONFIG = [myBank_smart_importer, ...(other importers)]


5. I was following the README documentation that said write bean-extract -f to 
invoke it on existing data. So I tried the following. Is this right?

bean-extract jonathan_smart.import ~/staging/new_bank_data.qfx -f 
journal/myledger.beancount > ~/staging/dud.txt

Cannot train the machine learning model because the training data is empty.

Cannot train the machine learning model because the training data is empty.


The output is just like the normal output without all the smart_importer stuff. 
 Seems I'm doing something wrong as the staging/dud.txt doesn't have any 
predictions.


Appreciate any assistance on this!


thanks,

Jonathan
--
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/820ef641-8178-47d1-9e97-afbc709e6a83n%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].

To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/fe28577c-8220-49cd-b976-40ef9f0b6a91n%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].

To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/6248ca60-16fa-4ad0-88b5-1c4bb91f9feen%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].

To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/2b800e6d-fb0c-4b78-bde3-477eee6f9e7en%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/f1e3ce25-e842-45b4-bb28-4f3737a3cb9en%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/CANUAcYdz12pG%2BPyxiBdn5-L14TtSztkJ8A%2BQ8Fwfd753vN0-tg%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/858c5ceb-7507-5f9c-793a-4dd5a4bd44e2%40ch.tario.org.
--
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/CANUAcYdNeEw9UjFsZzq3RmcusEVkjZS_XzS1h1PPA2JUPp9Sjw%40mail.gmail.com.

--
You received this message because you are subscribed to a topic in the Google Groups 
"Beancount" group.
To unsubscribe from this topic, visit 
https://groups.google.com/d/topic/beancount/rjrbf6Y39ew/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 
[email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/3ff79e07-83d4-3895-452f-42b287bc2ca4%40ch.tario.org.


--
a l a n a t p o l y p h a s e d o t c a
--
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/CAB5fSso7Z6JX95KJYAKrfABOkrzx2zjXUCO-pz4LLFVkxFm-Yw%40mail.gmail.com.


--
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/7e4eded9-dc61-1bf3-4d35-e0ea57cce446%40ch.tario.org.


--
You received this message because you are subscribed to a topic in the Google Groups 
"Beancount" group.
To unsubscribe from this topic, visit 
https://groups.google.com/d/topic/beancount/rjrbf6Y39ew/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 
[email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/CA3D88CF-6A2D-4862-92B6-BAE176272751%40gmail.com.

--
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/d351722e-d025-b801-ab4e-667804d9e5b4%40ch.tario.org.

Reply via email to