Hi Florian,
On 28.04.2019 11:22, Florian Lindner wrote:
Am Sonntag, 28. April 2019, 01:44:07 CEST schrieb Martin Blais:
> On Sat, Apr 27, 2019 at 6:28 PM Florian Lindner
<[email protected]> wrote:
>
> > Hello,
> >
> >
> > I am new into beancount / ledger and currently think about how to
do my
> > importing. I have written an importer for the csv statements from
my bank.
> > Two question I have:
> >
> >
> > + I would like to automatically rename the payee of some frequently
> > occurring transaction, such as shopping groceries and assign them to
> > accounts. Is there a canonical way do that or should I just hack
it into
> > the importer?
> >
>
> You should built it into your importer.
> This task is simple enough there's really no need for the library to
> provide common code to do that.
> You can roll your own.
Ok, you're right, that's easy.
You might also want to have a look at smart importer
https://github.com/beancount/smart_importer
This has some machine learning based approaches to automatically set
payees and accounts
> + How can I detect duplicates when I try to import the same transaction
> > twice?
> >
>
> That's a more difficult question.
> You should implement your own duplication detection code.
> It's not entirely obvious how to do this for everybody; the
definition of
> what's a duplicate depends on how much you manually massage your
> transactions.
> I haven't really tried very hard to generalize this well, so it's
best you
> define your own code for that.
Some brainstorming:
+ When beancount/fava talk about duplicates, it seems that it mostly
refers to duplicate transactions created by transferring from credit
card to checkings and import statements for both.
+ Save the original CSV line as metadata "source-line:".
Alternatively, build some unique tuple of (original payee, date,
amount) and save that as meta data. For each entry to import, query
beancount for an with matching metadata. Ledger does it like that when
--rich-data is given. It computes a hash (called UUID) from the input
line. Is there a distinct name of that metadata field you suggest?
Fava mentions a __source__ key, but that seems to be removed before
commiting
(https://github.com/beancount/fava/blob/master/fava/help/import.md).
+ Using the payee from beancount is not a good idea, as it usually has
been modified manually.
What are your thoughts?
There's actually some infrastructure around for this in core beancount
and some more with the smart_importer
https://github.com/beancount/smart_importer/blob/master/smart_importer/detector.py
DuplicateDetector will set the correct __duplicate__ metadata based on a
specified matching algorithm
apply_hooks(MyImporter(), [PredictPostings(), DuplicateDetector()]),
The default algorithm compares stuff like amount, accounts and dates but
you can also customize it. e.g. I have cases where I actually get a
reference number and want to use that one, I store the reference number
into the meta as 'ref'
class ReferenceDuplicatesComparator: def __call__(self, entry1, entry2):
return 'ref' in entry1.meta and 'ref' in entry2.meta and
entry1.meta['ref'] == entry2.meta['ref']
apply_hooks(MyImporter(), [PredictPostings(),
DuplicateDetector(comparator=ReferenceDuplicatesComparator())]),
Regards,
Patrick
--
You received this message because you are subscribed to the Google Groups
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/beancount/5bb5e915-43e3-370b-2026-d3216c7c855d%40ch.tario.org.
For more options, visit https://groups.google.com/d/optout.