On 12/09/2022 11:54, John Koala wrote:
Hi,

Yes, sorry, in the context of V2 still I'm afraid...

Perhaps you already know of a "fuzzy string matcher" for transaction narrations/payees?

I am not sure I grok the question: how could a fuzzy string matcher be specialized for transaction narrations or payees?

I didn't have much luck with "smart_importer" and decided the scipy/numpy/etc dependency was a PITA so am (or was) thinking to knock up a plugin to complete my imported transactions.

Fuzzy matching strings is not all there is to write a machine learning classifiers. I think that 'pip install scikit-learn' is immensely easier than rolling your own algorithms.

Maybe if you provide more details on how smart_importer does not work for you, someone can help you in making it work.

Is a plugin the correct idea?

I don' think so.

A plugin operates on the transactions read from a ledger after beancount after booking (the process for which all the postings in all the transactions are balanced, padding amounts are calculated, lots are computed, etc...). The transactions processed in this phase already need to have all postings completed.

Also, a plugin does not have a way to serialize the completed transactions into a ledger. Unless you hack something together, your plugin would run every time you load your ledger and will have to do its job again. This would make fixing any mistake the automatic categorization algorithms does rather cumbersome.

Why do you thing a plugin is a better approach?

I noted that the importer is provided with an `existing_entries` list of transactions, which seems a very useful suite of items to match against.  But can I reach that from the plugin?

That what? A plugin as access to all the transactions in the ledger on which Beancount is operating. In this context there isn't the notion of another ledger to which a batch of transactions will be added to.

Where/how? and is that even a good idea?  (its not going to re-read the entire history for every imported transaction is it? Hmm, I'd tolerate that nonetheless :-))  I'm assuming a `beancount.loader.load_file` inside the plugin would create some recursive sillyness?

The beancount parser and loader are capable of loading more than one file in the same process. However, there is no protection from a plugin that recursively tries to load the Beancount ledger from which it has been invoked. If you want to try the ledger filename is available as the "filename" entry to the "options_map" passed to the plugin entry point.

The way I would approach this, if you want a solution independent from the import framework, is to use beancount.parser.parse_file() to parse the transactions from a ledger, use the technique you like the most to complete or rewrite the transactions, and write them back with beancount.parser.printer.print_entries().

Cheers,
Dan

--
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/bc3179fa-4e93-c325-172b-a92a8c4e7cce%40grinta.net.

Reply via email to