Yeah 100% I have nothing against yaml-serializable and it sounds like a good idea. But my main issue is arbitrary python.
So you would say in 2.2.3 we "break" that again? Not wait for 3.0 because, even if it was perhaps an accident, support was there? On Thu, Nov 11, 2021, 12:14 AM Jarek Potiuk <[email protected]> wrote: > I can't really imagine a good way of supporting "dynamic python" via > external parameters (pickling AAAAAAARGH!). So still I think your idea > of deprecating non-serializable parameters is good. > > True - YAML serialization does not support "all-python", but it does > support "all data structures". But there is one difference vs > "json-serializable". > > But in this case "non YAML serializable" as opposed to "non-json > serializable" check we could introduce it very fast - even at patch > level. We do not have to wait for Airflow 3 IMHO. We already know that > people are using "sets" and this basically means that we have to go > the "deprecation" route. On the other hand allowing arbitrary dynamic > python parameters and not data structures should be considered as > "accidental" (it was never an intention) and not really part of the > public API, so we can simply "fix it" in 2.2.3 by requiring the > parameters to be YAML-serializable. > > J. > > > > On Wed, Nov 10, 2021 at 11:11 PM Daniel Standish <[email protected]> > wrote: > > > > OK but if you extend to yaml, you don't resolve the problem but only > move the goalpost a little bit. Because then the question remains; enforce > yaml-serializable or arbitrary python (which is the soon-to-be status quo > after 2.2.2 is released). > > > > On Wed, Nov 10, 2021 at 1:26 PM Jarek Potiuk <[email protected]> wrote: > >> > >> Very good question. > >> > >> Maybe we can solve it differently. We could recommend JSON and the > >> basic datasets that JSON supports, but it should be possible for > >> Airflow to support YAML as input format for params. > >> > >> Every JSON is also valid YAML and you can use Yaml parser to parse > >> JSON. Yaml supports sets (though the syntax is cumbersome) > >> https://yaml.org/type/set.html . So if we support YAML for both API > >> and UI, if you REALLY want to use set - you will be able to . > >> > >> My proposal is that all the examples etc. could show JSON and we > >> should support it, but there should be an asterisk (*) that if you > >> want advanced features like YAML, you could use YAML. Then we would > >> not have to deal with deprecation which is problematic as we could > >> only remove this feature in Airflow 3. > >> > >> J. > >> > >> On Wed, Nov 10, 2021 at 7:14 PM Daniel Standish <[email protected]> > wrote: > >> > > >> > Prior to 2.2.0, you could use non-json-serializable params in a dag. > Here's an example with `set`: > >> > > >> > @dag.task(params={"a": {1, 2, 3}, "b": [3, 4, 5]}) > >> > def set_param(intersection): > >> > print(intersection) > >> > > >> > > >> > set_param("{{ params.a.intersection(params.b).pop() }}") > >> > > >> > In 2.2.0 this was broken, and in 2.2.2rc1 it should be fixed. > >> > > >> > There's a problem though. An important part of params is the ability > to override them when triggering a dag from the web UI or CLI. But params > supplied through either mechanism should be json-serializable. > >> > > >> > So on one hand we allow arbitrary params, and on the other hand we > require them to be json serializable. We can see how this causes a problem > in the above example: if you provide `a` as a `list` in the UI, it won't > have the method `intersection`. > >> > > >> > So the question: should we deprecate non-json-serializable params? > >> > > >> > My feeling is yes. But I'm not sure how widely this flexibility is > used in the wild. > >> > >
