On Sat, Jul 22, 2023 at 3:34 AM Daniele Nicolodi <[email protected]> wrote:

> On 21/07/23 23:06, Eric Altendorf wrote:
> > I'm trying to figure out whether I can use the Beangulp import driver
> > with hooks, or if I need to write my own driver to call my importers and
> > do postprocessing.  As you may recall, my workflow is atypical, as I
> > have no curated Beancount ledger file; my source of truth are my input
> > data files and the Beancount ledger is a built artifact for running
> > analysis.
> >
> > There are two things I'd like to do that I don't think are currently
> > possible; I'd appreciate feedback on whether these seem like things
> > Beangulp should support (I could contribute a patch), or if I'm better
> > off finding a different solution:
> >
> > - I'd like to deduplicate entries among different importers in a single
> > run, not just dedup against a pre-existing ledger
>
> I was going to reply that this is already supported, then I realized
> that I never merged the patch implementing it
> https://github.com/beancount/beangulp/pull/64 I'm going to rebase and
> merge it ASAP.
>

That's great!  I have pulled the latest code, and it doesn't seem to be
deduplicating the expected items.  Let me check my assumptions:

I'm not sure how one is supposed to run multiple importers at once, the doc
<https://docs.google.com/document/d/1O42HgYQBQEna6YpobTqszSgTGnbRX7RdjmzR2xumfjs/edit#heading=h.9lk1l7gqxxfs>
kind
of only describes running one.  So I'm currently running with a Python
script that builds a list of importers, then runs Ingest, as follows; is
this correct, or am I missing some other setup code?

if __name__ == '__main__':
    importers = get_importers()
    hooks = []
    cli = beangulp.Ingest(importers, hooks).cli
    cli()

The deduplication is supposed to run by default, correct?

There seems to be a fairly good default implementation of similarity
comparison, yes?

Deduplication will happen among entries from *different* importers running
in the same run, right?



>
> > - I'd like to be able to emit the output file globally sorted by date
> > (first the official entry date, then secondarily by a timestamp attached
> > to the metadata) rather than grouped by import file.  (Broadly this will
> > make it easier for me to debug issues sequentially, and ordering
> > within-day may alleviate some of the issues I've seen with same-day
> > purchase & transfer transactions.)
>
> It is trivial to post-process the output of beangulp to apply any
> ordering you like. Indeed I do something very similar for ledgers.
> Writing from memory:
>
> import beanquery.parser.parser
> import beanquery.parser.printer
>
> def key(entry):
>      return (entry.date, entry.meta['timestamp'])
>
> entries, errors, options = parser.parse_file(filename)
> entries.sort(key=key)
> printer.print_entries(entries)
>

Hmm, OK, that may work fine, thanks.


>
> > And just to double check that this should already be possible:
> >
> > - I'd like to be able to add entries (i.e., account declarations,
> > initial balance pads, etc.) via a hook
>
> You can do this as part of the sorting post-processing step, or with a
> beancount plugin. See for example the beancount.plugins.auto_accounts
> (and other) plugins.
>

Cool, sounds good.  I hadn't dug into plugins yet.

Thank you!

eric


>
> Cheers,
> Dan
>
> --
> You received this message because you are subscribed to the Google Groups
> "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/beancount/3fdc241b-1fae-062b-22c6-42b718bd00cf%40grinta.net
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/CAFXPr0tqHop_YMnYTxeGgEkKRcv%3D3oPmMH8LDF-b-qbKwPdBBg%40mail.gmail.com.

Reply via email to