True Kamil. We can do it now and the file vs. dag is an important difference indeed!
J On Thu, Sep 17, 2020 at 10:44 AM Kamil Breguła <[email protected]> wrote: > Only in some cases, we do not have a DAG ID, but only have the path to the > file. > > In my opinion, we can do it now. One of the most important changes is to > ensure a stable ID. Now we delete and add the errors again, and we should > check which errors should be deleted and which should be added. If we have > a stable ID then we can add new metadata to the row. > > Why do we need a new table - import_errors_history? Can't we use the > current table? > > > > On Thu, Sep 17, 2020 at 10:01 AM Jarek Potiuk <[email protected]> > wrote: > > > I am all for it. > > > > This should be - likely - connected with the future versioning of DAGs > > (currently deferred to 2.1). > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-36+DAG+Versioning > > - possibly, rather than being a separate AIP, it should be > > incorporated there. > > > > I believe in the versioning implementation we will already have a > > table where we will keep information about DAGs together with their > > hash, so it seems natural that such "errors" should be connected to > > such "DAG_ID" "HASH" combination. > > > > And I would love to change the name of it to "parse errors". "Import > > errors" suggests that those are errors that come from a wrong "import" > > statement. But we are really talking about any kind of parsing error. > > > > J. > > > > On Wed, Sep 16, 2020 at 3:48 AM Jacob Ferriero > > <[email protected]> wrote: > > > > > > Hello Airflow Dev List, > > > > > > I'm considering proposing a refactor to import errors in order to > support > > > sending alert emails when the scheduler finds an import error (but not > > > every time the scheduler finds the same import error). This is > currently > > > not possible because the import errors are cleared during each > scheduler > > > loop. > > > > > > I'd like to poll the community for perspectives on other short commings > > of > > > the import error model before proposing a refactor or other use cases > > folks > > > might have for such a refactor (e.g. supporting an arbitrary callback > > > function similar to SLA miss). > > > > > > My current thought is to just add an import_errors_history table to the > > > database that is not cleared on each scheduler loop and does keep track > > of > > > if an email was sent in a boolean field. The primary key could be > > > constructed from a file hash and exception classname. > > > > > > Does this one use case warrant a new table? Should we just replace the > > > import_errors table in place? > > > > > > If I can get a sense of high-level direction I can put together an AIP > / > > PR. > > > > > > Cheers, > > > Jake > > > > > > -- > > > > > > *Jacob Ferriero* > > > > > > Strategic Cloud Engineer: Data Engineering > > > > > > [email protected] > > > > > > 617-714-2509 > > > > > > > > -- > > > > Jarek Potiuk > > Polidea | Principal Software Engineer > > > > M: +48 660 796 129 > > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
