Re: A couple questions about importers

Martin Blais Sun, 11 Feb 2018 21:33:02 -0800

On Sun, Feb 4, 2018 at 7:23 AM, 'Patrick Ruckstuhl' via Beancount <
[email protected]> wrote:


> >- Using prices in imports
>>>
>> >>
>>> >> For some imports I would like to enhance the transactions with prices
>>> >> based on current/daily price, I'm currently fetching and storing
>>> >prices
>>> >> in beancount, so prices are available in the beancount file but I'm
>>> >not
>>> >> sure what the best way to hook this into the importer framework is
>>> >>
>>> >
>>> >Fetching prices automatically /is/ OTOH intended to be automated.
>>> >(Note that we're in a funny situation right now with both Yahoo and
>>> >Google
>>> >Finance APIs disabled.)
>>> >
>>> >These are two separate processes at the moment; run one, then the
>>> >other.
>>> >Concatenate to a file if you want to.
>>>
>>> That's what I'm doing. What I'm looking is how to access the prices from
>>> the beancount file in the importer, do I have to parse/load the beancount
>>> file o  my own?
>>>
>>
>> Yes.
>> You'd call beancount.loader.load_file() on your existing file, and then
>> build a price_map dict.
>> Grep for "price_map" in the source code, you'll find several examples of
>> doing that.
>>
>>
> I'm wondering if it would make sense to slightly enhance the
> ImporterProtocol. Right now bean-extract has already the ability to parse
> an existing beancount file and use this for duplicate detection.
> Now if those entries could be forwarded to the importer it would open up
> some use cases such as:
>
> * custom duplicate logic (e.g. let's say my import file has a unique
> identifier which I map to metadata on the transaction, if I now get the
> existing import entries, I can make sure to only import new transactions
> and either completely ignore duplicates or tag them with the __duplicate__
> meta
>
> * my use case where I need data (e.g. prices) from the existing beancount
> file to enhance the new entries
>
> I think all that would be needed is to add existingEntries to the extract
> method:
>
> importer.extract(file, existing_entries)
>
>
> That way there is no additional parsing of beancount file needed and there
> is a clear way to define which file to parse (e.g. same way to do it when
> called from fava as well as from bean-extract)
>

That's an interesting idea. It's an easy change to make.

+ Note that if we add the entries to the extractor, it opens up the
possibility for the particulars of the extractor to depend on particulars
of previously imported transactions. To paraphrase your example, if an
extractor knows that its input file contains a unique transaction id column
and it consistently attaches that as a "link" on the transaction, it can
then use that fact to very reliably flag transactions as duplicates in the
future by inspecting the link field of those transactions (assuming the
user hasn't removed them in the text). That may be a good thing, because
that kind of check may NOT be generalizable across different importers,
unless we'd establish some sort of guarantee that some links represent
globally unique identifiers. In a sense, the current method for flagging
duplicates assumes that a general method for detecting duplicates - after
fiddling and manual adjustments by the user - exists.

- On the downside, preventing access to the previous entries essentially
decouples the duplicate detection method and the importer logic. This would
force the duplicate logic to remain generic. The importer having access to
the prior directives creates a logical dependency between it and the
duplicate detection. I'm not sure we have to worry about that.

Given that the duplicate logic has been iffy ever since it existed, I think
it's a reasonable thing to try. Let's do it and see what happens, if people
start relying on it. To be fair, I think more work could simply be done on
the duplicate logic to make it more resilient, but in the interest of
flexibility, let's add this.

So here's the change:
- The Importer.extract() method now accepts a new parameters with the prior
entries (or None, if not specified). It's free to use that as it pleases.
- The entries returned by Importer.extract() will be checked for
__duplicate__ metadata and automatically inserted to that set if it is
present. This allows the importer to return some duplicate entries for
context - which will be rendered as such in the output, e.g. commented out
- without having to necessarily throw them away.
- Current importer parameters are still supported as legacy (I really
didn't want to break everyone's importers with this API change, so I
inspect the signature).

Here:
https://bitbucket.org/blais/beancount/commits/f9728f0c9594fae38e3ff7fa7e1f8dd2190ab6da

-- 
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/CAK21%2BhNBa7hM_DPtYeL5bUxTip06WoUnBcUsjRL6%2BNpYB_vz1w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: A couple questions about importers

Reply via email to