OK so we have a consensus of 3. Should I create a voting thread for this? Namely, formally deprecating non-json-serializable params, for removal in 3.0?
On Thu, Nov 11, 2021 at 2:40 PM Jarek Potiuk <[email protected]> wrote: > Indeed - if we want to really "deprecate" (and drop in 3.0) support > for non-serializable params, then JSON serialization is the way to go. > > The only "benefit" of using YAML is "set" support but if we are going > to "deprecate" non-serializable params, then we can easily include > "set" in being 'non-serializable" and use JSON. There is a good reason > why JSON does not support sets (because in serialized form it is > exactly the same as list - there is no difference, really) > > J. > > On Thu, Nov 11, 2021 at 9:24 PM Kaxil Naik <[email protected]> wrote: > > > > -1 for breaking it again. We should go ahead with a deprecation route. > JSON serializable makes sense, I am not fully convinced if YAML > serializable is any better ! > > > > Another note is the current params which Daniel fixed now use > Serialization from our DAG Serialization - > https://github.com/apache/airflow/blob/7622f5e08261afe5ab50a08a6ca0804af8c7c7fe/airflow/serialization/serialized_objects.py#L289-L330 > so it currently supports Timedelta, Timezone, Datetime, Tuple etc objects. > > > > But I agree with Daniel that we should deprecate and only support JSON > Serializable objects to make it fully featured like overriding it via CLI, > API and Webserver. > > > > Regards, > > Kaxil > > > > On Thu, Nov 11, 2021 at 6:51 PM Daniel Standish <[email protected]> > wrote: > >> > >> Yeah I agree with you. > >> > >> The one other thing I'll mention is the other use case that was raised > in an issue was `datetime` which like set is also not json-serializable, > but unlike set would probably not be yaml-serializable. > >> > >> But yeah let's see if others can help establish a consensus. > >> > >> Small note: another thing sortof in support of your position is that, > if you can't override the param from UI and CLI and these other means > (because it's not expecting something that can be serialize that way), then > you don't even need it to be a `param` but it could just as easily be an > operator or task arg instead. I.e. if you're staying in python you can > use python; but what's special about params is they can be set from > outside. The other side of this though is that probably arbitrary dag > params probably _did_ work with trigger dag run operator. > >> > >> > >> On Thu, Nov 11, 2021 at 10:06 AM Jarek Potiuk <[email protected]> wrote: > >>> > >>> > So you would say in 2.2.3 we "break" that again? Not wait for 3.0 > because, even if it was perhaps an accident, support was there? > >>> > >>> Yep. If others agree this is the way to go, I'd be happy to. We had > >>> some other changes that worked "accidentally" but were never stated > >>> that they work this way. I think it's a pretty good assumption (even > >>> if it is implicit) that "params" set for dag triggering are "data" and > >>> not "code". It could be python callable of course, but I think it's > >>> kinda "abuse" - especially that it excludes triggering via CLI/UI. > >>> > >>> The thing is that we do not "specify" what is our "stable API" and > >>> what is not also in many other places, there is a certain ambiguity > >>> for some of them. Of course it's not only whether it is "specified" or > >>> "not", it's also much more "whether a lot of people could interpret > >>> and use it in this way". I think (but maybe others can chime in) - it > >>> would be reasonable to assume that using callables or other Python > >>> Code is expected when we already have: > >>> a) CLI with string/JSON input > >>> b) UI with string/JSON input > >>> c) ability of using JSON-schema to actually verify the parameters > >>> (this last one actually shows a clear intention of having those > >>> parameters "data only").. > >>> > >>> J. >
