Re: [DISCUSS] Refactoring Import Errors

Jarek Potiuk Thu, 17 Sep 2020 02:53:42 -0700

True Kamil.  We can do it now and the file vs. dag is an important
difference indeed!


J


On Thu, Sep 17, 2020 at 10:44 AM Kamil Breguła <[email protected]>
wrote:

> Only in some cases, we do not have a DAG ID, but only have the path to the
> file.
>
> In my opinion, we can do it now. One of the most important changes is to
> ensure a stable ID. Now we delete and add the errors again, and we should
> check which errors should be deleted and which should be added. If we have
> a stable ID then we can add new metadata to the row.
>
> Why do we need a new table - import_errors_history? Can't we use the
> current table?
>
>
>
> On Thu, Sep 17, 2020 at 10:01 AM Jarek Potiuk <[email protected]>
> wrote:
>
> > I am all for it.
> >
> > This should be - likely - connected with the future versioning of DAGs
> > (currently deferred to 2.1).
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-36+DAG+Versioning
> > - possibly, rather than being a separate AIP, it should be
> > incorporated there.
> >
> > I believe in the versioning implementation we will already have a
> > table where we will keep information about DAGs together with their
> > hash, so it seems natural that such "errors" should be connected to
> > such "DAG_ID" "HASH" combination.
> >
> > And I would love to change the name of it to "parse errors". "Import
> > errors" suggests that those are errors that come from a wrong "import"
> > statement. But we are really talking about any kind of parsing error.
> >
> > J.
> >
> > On Wed, Sep 16, 2020 at 3:48 AM Jacob Ferriero
> > <[email protected]> wrote:
> > >
> > > Hello Airflow Dev List,
> > >
> > > I'm considering proposing a refactor to import errors in order to
> support
> > > sending alert emails when the scheduler finds an import error (but not
> > > every time the scheduler finds the same import error). This is
> currently
> > > not possible because the import errors are cleared during each
> scheduler
> > > loop.
> > >
> > > I'd like to poll the community for perspectives on other short commings
> > of
> > > the import error model before proposing a refactor or other use cases
> > folks
> > > might have for such a refactor (e.g. supporting an arbitrary callback
> > > function similar to SLA miss).
> > >
> > > My current thought is to just add an import_errors_history table to the
> > > database that is not cleared on each scheduler loop and does keep track
> > of
> > > if an email was sent in a boolean field. The primary key could be
> > > constructed from a file hash and exception classname.
> > >
> > > Does this one use case warrant a new table? Should we just replace the
> > > import_errors table in place?
> > >
> > > If I can get a sense of high-level direction I can put together an AIP
> /
> > PR.
> > >
> > > Cheers,
> > > Jake
> > >
> > > --
> > >
> > > *Jacob Ferriero*
> > >
> > > Strategic Cloud Engineer: Data Engineering
> > >
> > > [email protected]
> > >
> > > 617-714-2509
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea | Principal Software Engineer
> >
> > M: +48 660 796 129
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [DISCUSS] Refactoring Import Errors

Reply via email to