Re: [DISCUSS] Refactoring Import Errors

Kamil Breguła Thu, 17 Sep 2020 01:44:39 -0700

Only in some cases, we do not have a DAG ID, but only have the path to the
file.


In my opinion, we can do it now. One of the most important changes is to
ensure a stable ID. Now we delete and add the errors again, and we should
check which errors should be deleted and which should be added. If we have
a stable ID then we can add new metadata to the row.

Why do we need a new table - import_errors_history? Can't we use the
current table?



On Thu, Sep 17, 2020 at 10:01 AM Jarek Potiuk <[email protected]>
wrote:

> I am all for it.
>
> This should be - likely - connected with the future versioning of DAGs
> (currently deferred to 2.1).
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-36+DAG+Versioning
> - possibly, rather than being a separate AIP, it should be
> incorporated there.
>
> I believe in the versioning implementation we will already have a
> table where we will keep information about DAGs together with their
> hash, so it seems natural that such "errors" should be connected to
> such "DAG_ID" "HASH" combination.
>
> And I would love to change the name of it to "parse errors". "Import
> errors" suggests that those are errors that come from a wrong "import"
> statement. But we are really talking about any kind of parsing error.
>
> J.
>
> On Wed, Sep 16, 2020 at 3:48 AM Jacob Ferriero
> <[email protected]> wrote:
> >
> > Hello Airflow Dev List,
> >
> > I'm considering proposing a refactor to import errors in order to support
> > sending alert emails when the scheduler finds an import error (but not
> > every time the scheduler finds the same import error). This is currently
> > not possible because the import errors are cleared during each scheduler
> > loop.
> >
> > I'd like to poll the community for perspectives on other short commings
> of
> > the import error model before proposing a refactor or other use cases
> folks
> > might have for such a refactor (e.g. supporting an arbitrary callback
> > function similar to SLA miss).
> >
> > My current thought is to just add an import_errors_history table to the
> > database that is not cleared on each scheduler loop and does keep track
> of
> > if an email was sent in a boolean field. The primary key could be
> > constructed from a file hash and exception classname.
> >
> > Does this one use case warrant a new table? Should we just replace the
> > import_errors table in place?
> >
> > If I can get a sense of high-level direction I can put together an AIP /
> PR.
> >
> > Cheers,
> > Jake
> >
> > --
> >
> > *Jacob Ferriero*
> >
> > Strategic Cloud Engineer: Data Engineering
> >
> > [email protected]
> >
> > 617-714-2509
>
>
>
> --
>
> Jarek Potiuk
> Polidea | Principal Software Engineer
>
> M: +48 660 796 129
>

Re: [DISCUSS] Refactoring Import Errors

Reply via email to