Oh, it only matters if core.dag_run_conf_overrides_params is set in the config, where in it calls
https://github.com/apache/airflow/blob/86909de47c40afacb79aeb778bbf2784b0b5c7f6/airflow/models/taskinstance.py#L1450-L1452: def overwrite_params_with_dag_run_conf(self, params, dag_run): if dag_run and dag_run.conf: params.update(dag_run.conf) (params is initialized to `{}`) But since it _can_ matter, I would say we should enforce and require it to be an object (or string-containing an object) -ash On May 29 2020, at 3:42 pm, Tomasz Urbaszek <turbas...@apache.org> wrote: > Why? Do we have any recommendations / docs for this feature? All I can > find is "JSON string that gets pickled into the DagRun’s conf > attribute". > > > T. > > > On Fri, May 29, 2020 at 4:33 PM Ash Berlin-Taylor <a...@apache.org> wrote: >> >> That should _not_ be allowed, it must be a dict/object. No array, or >> other primitive at the top level. >> >> -ash >> >> >> On May 29 2020, at 3:27 pm, Tomasz Urbaszek <turbas...@apache.org> wrote: >> >> > Do I correctly understand that users can specify conf="[1,2,3]" as >> > this is >> > also a valid JSON? >> > >> > T. >> > >> > On Fri, May 29, 2020 at 4:18 PM Kamil Breguła <kamil.breg...@polidea.com> >> > wrote: >> > >> >> The problem is with mixing objects that have a known structure with >> >> objects that do not have this structure. Common JSON parser in Java >> >> can generate objects if all the attributes have a known type. >> >> Attributes that are unrecognized are ignored because Java cannot >> >> dynamically generate classes. This problem is very visible when >> >> reading a list of objects from the API. >> >> >> >> For example. We have the following DAGRun. >> >> DagRun(conf='{"A":"B"}') >> >> DagRun(conf='{"C":"D"}') >> >> They can be represented by the following JSON objects. >> >> {"conf": {"A":"B"}} >> >> {"conf": {"C":"D"}} >> >> >> >> A type representing the DAG Run is automatically generated based on >> >> the API specification. I prepared an example that shows such a >> type in >> >> many different technologies for AJava >> >> I used official library generators for OpenAPI >> >> https://github.com/OpenAPITools/openapi-generator >> >> Example class definitions >> >> >> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-okhttp-gson/src/main/java/org/openapitools/client/model/DAGRun.java#L64 >> >> >> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-feign/src/main/java/org/openapitools/client/model/DAGRun.java#L65 >> >> >> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-jersey1/src/main/java/org/openapitools/client/model/DAGRun.java#L65 >> >> >> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-jersey2/src/main/java/org/openapitools/client/model/DAGRun.java#L65 >> >> >> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-microprofile/src/main/java/org/openapitools/client/model/DAGRun.java#L64 >> >> >> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-native/src/main/java/org/openapitools/client/model/DAGRun.java#L65 >> >> >> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-okhttp-gson/src/main/java/org/openapitools/client/model/DAGRun.java#L63 >> >> >> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-rest-assured/src/main/java/org/openapitools/client/model/DAGRun.java#L64 >> >> >> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-resteasy/src/main/java/org/openapitools/client/model/DAGRun.java#L65 >> >> >> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-resttemplate/src/main/java/org/openapitools/client/model/DAGRun.java#L65 >> >> >> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-retrofit2/src/main/java/org/openapitools/client/model/DAGRun.java#L63 >> >> >> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-vertx/src/main/java/org/openapitools/client/model/DAGRun.java#L65 >> >> >> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-webclient/src/main/java/org/openapitools/client/model/DAGRun.java#L64 >> >> None allowed to generate a class structure that allows reading the >> >> attributes contained in the conf field if type == object. >> >> >> >> Libraries for other languages and with different fields are also >> >> available in the repository. >> >> >> >> As an object: >> >> Patch: >> >> https://github.com/mik-laj/airflow-api-clients/blob/master/spec-object.patch >> >> Libraries: >> >> https://github.com/mik-laj/airflow-api-clients/tree/master/out/object >> >> >> >> Two field types at the same time: >> >> Patch: >> >> https://github.com/mik-laj/airflow-api-clients/blob/master/spec-oneoff.patch >> >> Libraries: >> >> https://github.com/mik-laj/airflow-api-clients/tree/master/out/oneoff >> >> >> >> Without modification (string): >> >> Libraries: >> >> https://github.com/mik-laj/airflow-api-clients/tree/master/out/string >> >> >> >> >> >> I have the biggest concerns about reading data, not about sending data >> >> so in my opinion we should think about how we want the response from >> >> the server to look like? >> >> >> >> GET //dags/{DAG_ID}/dagRuns/ >> >> >> >> { >> >> "dag_runs": [ >> >> {"conf": {"A":"B"}} >> >> {"conf": {"C":"D"}} >> >> ] >> >> } >> >> >> >> or >> >> "dag_runs": [ >> >> {"conf": "{\"A\":\"B\"}} >> >> {"conf": "{\"C\":\"D\"}} >> >> ] >> >> } >> >> >> >> If we choose one output format, we can think about whether we want to >> >> make changes to this object structure so that the input object does >> >> not match the output object. In my opinion, this is not a good idea. >> >> REST is based on resources/types that have a unified interface and >> >> fields that are ambiguous can be problematic for users. >> >> >> >> In my opinion, the problem is that the field format and the message >> >> format is JSON. This is just a coincidence. In the future, we may add >> >> a different message format, e.g. protobuff, and we will not be >> able to >> >> provide common schema. We will have to introduce discrepancies between >> >> spec for protobuff and spec for JSON when we change only transform >> >> format. >> >> >> >> On Fri, May 29, 2020 at 12:08 PM Ash Berlin-Taylor >> <a...@apache.org> wrote: >> >> > >> >> > On May 29 2020, at 8:29 am, Jarek Potiuk <jarek.pot...@polidea.com> >> >> wrote: >> >> > > *Root problem* >> >> > > >> >> > > I think the root problem is not in the interface/API but the ambiguity >> >> of >> >> > > the DagRun.conf field. The way it is defined now, It can actually >> >> be an >> >> > > arbitrary object, not even a dictionary. >> >> > >> >> > So that is a gap in the definition. It _needs_ to be a dictionary/JSON >> >> > *Object* specifically, else it will break when it is used. >> >> > >> >> > > but in case of the API, it is just a tool that is used to pass the >> >> > > data, and there should be some well-defined and fixed logic >> about the >> >> > > structure >> >> > >> >> > My view is that specifying it as a JSON object in the spec (but saying >> >> > nothing about the properties it might contain) is more precise >> than a >> >> > string. Compare these two snippets of OpenAPI spec: >> >> > >> >> > >> >> > conf: >> >> > type: string >> >> > description: > >> >> > JSON string (which must contain a top-level JSON object) >> >> > describing additional configuration parameters. >> >> > >> >> > vs >> >> > >> >> > conf: >> >> > type: object >> >> > description: > >> >> > JSON object describing additional configuration parameters. >> >> > >> >> > >> >> > The first one means technically, acording to the spec you can >> >> submit `{ >> >> > "conf": "abc" }` -- but that will fail. Where as the second >> means that >> >> > both the OpenAPI client, and the automatic validation from >> >> Connexion on >> >> > the server will handle that for us. >> >> > >> >> > (With the string case, we would have to JSON deserialize it and check >> >> > it's an JSON object/ python dict. Nothing else is allowed.) >> >> > >> >> > > *Java case (and other static-typed languages)* >> >> > >> >> > import com.google.gson.Gson; >> >> > import com.google.gson.JsonObject; >> >> > >> >> > class Demo { >> >> > >> >> > public static class Employee >> >> > { >> >> > private Integer id; >> >> > private String firstName; >> >> > private String lastName; >> >> > >> >> > public Employee(Integer id, String firstName, String lastName){ >> >> > this.id = id; >> >> > this.firstName = firstName; >> >> > this.lastName = lastName; >> >> > } >> >> > } >> >> > >> >> > public static void main(String args[]) { >> >> > >> >> > Demo.Employee employee = new Demo.Employee(1, "Ash", >> >> "Berlin-Taylor"); >> >> > >> >> > Gson gson = new Gson(); >> >> > >> >> > JsonObject e = gson.toJsonTree(employee).getAsJsonObject(); >> >> > >> >> > // Just for debugging/testing >> >> > System.out.println(e.toString()); >> >> > >> >> > } >> >> > } >> >> > >> >> > which outputs >> >> > >> >> > {"id":1,"firstName":"Ash","lastName":"Berlin-Taylor"} >> >> > >> >> > >> >> > Entirely possible to create arbitrary JSON structures in Java. >> (You can >> >> > also just create a com.google.gson.JsonObject directly and call >> >> > `.add()`) >> >> > >> >> > To use your example DagRun class, the signature could become: >> >> > >> >> > public DagRun(String dagRunId, String executionDate, Object conf); >> >> > >> >> > and then the client can do `gson.toJsonTree(conf).getAsJsonObject()` >> >> internally. >> >> > >> >> > >> >> > So I'm still +1 for dict directly. Even more so now I have >> written this >> >> > in Java. >> >> > >> >> > JSON-encoded-string in a JSON api is a "code smell" to me. >> >> > >> >> > -ash >> >> > >> >> > >> >> > On May 29 2020, at 8:29 am, Jarek Potiuk <jarek.pot...@polidea.com> >> >> wrote: >> >> > >> >> > > And one more comment to add - this is the DagRun class I wrote >> >> for the >> >> > > imaginary Java client. That's all (including the serializer >> from the >> >> > > previous mail) needed to be able to easily build the POST >> method call >> >> in >> >> > > Java. >> >> > > I would really like to challenge someone more experienced with Java >> >> writes >> >> > > or provides some examples, either the class holding arbitrary complex >> >> conf >> >> > > object or (if we stick to String way of storing the String) - to >> >> > > serialize/deserialise the object from son. >> >> > > >> >> > > I really believe it is far from trivial and by choosing "object" >> >> way of >> >> > > sending the conf, we significantly increase the complexity of clients >> >> > > accessing Airflow API (which should be our goal) at the >> expense of >> >> > > "slightly" less readable code. >> >> > > >> >> > > J. >> >> > > >> >> > > >> >> > > public static class DagRun { >> >> > > private final String dagRunId; >> >> > > private final String executionDate; >> >> > > private final String conf; >> >> > > >> >> > > public String getDagRunId() { >> >> > > return dagRunId; >> >> > > } >> >> > > >> >> > > public String getExecutionDate() { >> >> > > return executionDate; >> >> > > } >> >> > > >> >> > > public String getConf() { >> >> > > return conf; >> >> > > } >> >> > > >> >> > > public DagRun(String dagRunId, String executionDate, String >> >> conf){ >> >> > > this.dagRunId = dagRunId; >> >> > > this.executionDate =executionDate; >> >> > > this.conf = conf; >> >> > > } >> >> > > } >> >> > > >> >> > > J. >> >> > > >> >> > > >> >> > > On Fri, May 29, 2020 at 8:40 AM Jarek Potiuk <jarek.pot...@polidea.com >> >> > >> >> > > wrote: >> >> > > >> >> > >> After giving it quite some time to try and think about it this >> >> morning, >> >> > >> and looking at consequences - I am in strong favour of >> passing string >> >> as >> >> > >> conf (Kamil's proposal). >> >> > >> >> >> > >> I don't think the dictionary is good. And trying to >> accommodate both >> >> > >> is I >> >> > >> think combining the worst of both worlds. Let me explain why. >> >> > >> >> >> > >> *Root problem* >> >> > >> >> >> > >> I think the root problem is not in the interface/API but the >> >> > >> ambiguity of >> >> > >> the DagRun.conf field. The way it is defined now, It can >> >> actually be >> >> an >> >> > >> arbitrary object, not even a dictionary. I could pass it an >> >> array, or >> >> > >> bool, or pretty much any serializable object it seems. Except >> >> for a >> >> few >> >> > >> tests where both "string" and "dict" are accepted in the old >> >> experimental >> >> > >> API I do not see anywhere (please correct me if I am wrong) >> any kind >> >> of >> >> > >> specification for the conf object. That indeed makes it rather >> >> hard to >> >> > >> reason about it in statically typed languages. And while I understand >> >> why >> >> > >> we need it and why in Python environment we are fairly >> relaxed about >> >> this >> >> > >> (JINJA for example) - I do not think this should influence the >> >> complexity >> >> > >> of the API "structure". >> >> > >> >> >> > >> *API Structure: * >> >> > >> >> >> > >> I think JSON structure in the API should be fixed and well >> >> defined. I >> >> > >> think it is much better to be very explicit about the "conf" >> >> > >> parameter that >> >> > >> it is "string JSON representation" than the arbitrary object. JSON >> >> > >> is, of >> >> > >> course, dynamic but in case of the API, it is just a tool >> that is >> >> > >> used to >> >> > >> pass the data, and there should be some well-defined and >> fixed logic >> >> about >> >> > >> the structure.Precisely to make it easier to parse and >> prepare. I >> >> > >> have a >> >> > >> feeling that putting that dynamic nature of JSON into API structure >> >> > >> definition is quite an abuse of that dynamic nature. >> >> > >> >> >> > >> *Java case (and other static-typed languages)* >> >> > >> >> >> > >> I haven't written Java for years, but I decided to give it a >> >> try. I >> >> tried >> >> > >> to see how complex it would be to write serialization/deserialization >> >> for >> >> > >> that using one of the most common parsers in Java world - Gson. >> >> > >> >> >> > >> The string case is super simple: >> >> > >> >> >> > >> public JsonElement serialize(final DagRun dagRun, >> final Type >> >> type, >> >> > >> final JsonSerializationContext context) { >> >> > >> JsonObject result = new JsonObject(); >> >> > >> result.add("dag_run_id", new >> >> > >> JsonPrimitive(dagRun.getDagRunId())); >> >> > >> result.add("execution_date", new >> >> > >> JsonPrimitive(dagRun.getExecutionDate())); >> >> > >> result.add("conf",new JsonPrimitive(dagRun.getConf())); >> >> > >> return result; >> >> > >> } >> >> > >> >> >> > >> The dynamic object of arbitrary complexity I do not even know >> >> how to >> >> > >> approach. Eventually what I would have to do is to convert it >> to JSON >> >> > >> String anyway - because that's the only way you can keep arbitrary >> >> complex >> >> > >> structure in Java. >> >> > >> >> >> > >> *Deferring problem?* >> >> > >> >> >> > >> Also - I do not think we are deferring the problem for later. The >> >> > >> thing is >> >> > >> that the only entity that cares about the "content" of the >> conf being >> >> > >> accessible as an object is the Python DAG reading the conf object >> >> (likely >> >> > >> with some JINJA templates). >> >> > >> >> >> > >> We will likely never, ever have to parse, de-jsonize the >> string on >> >> the >> >> > >> client-side. We will just have to prepare it (to send) and possibly >> >> display >> >> > >> it (if we ever read that conf via API). >> >> > >> The imaginary client communicating with Airflow will simply pass >> >> whatever >> >> > >> the User will tell it to do. And IMHO this is far easier if we >> >> do not >> >> have >> >> > >> to convert it to object on the flight . >> >> > >> >> >> > >> I can imagine several use cases for that method: >> >> > >> >> >> > >> 1) User types - by hand - the whole "conf" object to pass to the >> >> trigger >> >> > >> method. Likely typing JSON-string directly. That's a typical >> >> case for >> >> any >> >> > >> kind of CLI I can imagine - where usually you pass JSON >> string for >> >> > >> arbitrary complex objects. >> >> > >> >> >> > >> 2) The client-side implementation will define some limited >> set of >> >> > >> parameters that could be used by the user and let the user >> enter it >> >> > >> via a >> >> > >> form. Based on that a "conf" object will be created following >> >> predefined >> >> > >> structure (specific for this case). For example, the user >> >> chooses a >> >> > >> date to >> >> > >> run, and the form produces {"date: >> "${DATE_CHOSEN_BY_THE_USER}" } >> >> object. >> >> > >> >> >> > >> 3) Theoretically, it's possible that user enters arbitrary complex >> >> OBJECT >> >> > >> via any kind of structured "generic" interface. That would be a >> >> nightmare, >> >> > >> however (both from the user and developer point of view). So I >> >> disregard >> >> > >> that option. >> >> > >> >> >> > >> In case of 1) we would have to parse the data ??? and turn it into >> >> > >> object to serialize. Python can do that, but Java can't >> easily. And >> >> > >> well - >> >> > >> there is no point in doing it - we cannot do anything with >> conf as >> >> object, >> >> > >> we have no idea if it is valid or not, we have to anyhow pass >> it to >> >> > >> DAG so >> >> > >> that DAG can access whatever fields are needed. I think we >> would be >> >> better >> >> > >> to pass it as the very string user entered (After we check if >> >> this is >> >> a >> >> > >> valid json - which we can do easily). >> >> > >> >> >> > >> In case of 2) it's also super easy to turn such pre-defined structure >> >> into >> >> > >> JSON string. It's trivial in any language. There is virtually no >> >> > >> benefit of >> >> > >> passing it as object. - the "slightly" better readabilty >> maybe the >> >> > >> only one >> >> > >> >> >> > >> Of course - as Ash and Kaxil mentioned, we could pass both - >> >> string or >> >> > >> dict, but I think that is very wrong. Should we also allow array >> >> > >> (this is a >> >> > >> valid json structure)? Should we allow passing standalone >> bool, int >> >> .. >> >> > >> objects (they are valid json). Also how about sending >> "STRING" as >> >> conf >> >> > >> value (is it string or is it json-encoded object). This is a >> >> bad, bad >> >> idea. >> >> > >> >> >> > >> So summarizing - I am strongly for passing "string" rather than >> >> object. >> >> > >> >> >> > >> J. >> >> > >> >> >> > >> >> >> > >> On Fri, May 29, 2020 at 1:58 AM Kaxil Naik <kaxiln...@gmail.com> >> >> wrote: >> >> > >> >> >> > >>> If the only problem is with Java and not any other popular >> >> > >>> languages, I >> >> > >>> would say we go for Option (2). >> >> > >>> >> >> > >>> If not, supporting both is a good idea. >> >> > >>> >> >> > >>> Regards, >> >> > >>> Kaxil >> >> > >>> >> >> > >>> On Fri, May 29, 2020 at 12:19 AM QP Hou <q...@scribd.com> wrote: >> >> > >>> >> >> > >>> > While I understand the difficulty of dealing with nested json >> >> without >> >> > >>> > predefined schemas, I feel like returning it as a string only >> >> delays >> >> > >>> > the problem to a later stage since the user will still >> need to >> >> parse >> >> > >>> > that string into a strongly typed data structure in order >> to read >> >> the >> >> > >>> > values. >> >> > >>> > >> >> > >>> > I don't have much experience in Java so I can't really >> >> comment on >> >> > >>> > that. But I can confirm that it's pretty straightforward >> to deal >> >> with >> >> > >>> > this in C/C++, Rust and Go. >> >> > >>> > >> >> > >>> > On Thu, May 28, 2020 at 2:57 PM Ash Berlin-Taylor >> >> > >>> > <a...@apache.org> >> >> > >>> wrote: >> >> > >>> > > >> >> > >>> > > Hi everyone, >> >> > >>> > > >> >> > >>> > > We're really close to getting the OpenAPI spec merged, >> just one >> >> last >> >> > >>> > > question that's come up around how we should handle/represent >> >> > >>> > > dagrun.conf to triggerDagRun. >> >> > >>> > > >> >> > >>> > > Which of the these two do people prefer? >> >> > >>> > > >> >> > >>> > > >> >> > >>> > > POST /api/v1/dags/{dag_id}/dagRuns/{dag_run_id} >> >> > >>> > > Content-Type: application/json >> >> > >>> > > >> >> > >>> > > { >> >> > >>> > > "dag_run_id": "manual_2020-05-28T21:42:36Z", >> >> > >>> > > "execution_date": "2020-05-28T21:42:36Z", >> >> > >>> > > "conf": "{\"key\": \"value\" }" >> >> > >>> > > } >> >> > >>> > > >> >> > >>> > > OR >> >> > >>> > > >> >> > >>> > > >> >> > >>> > > POST /api/v1/dags/{dag_id}/dagRuns/{dag_run_id} >> >> > >>> > > Content-Type: application/json >> >> > >>> > > >> >> > >>> > > { >> >> > >>> > > "dag_run_id": "manual_2020-05-28T21:42:36Z", >> >> > >>> > > "execution_date": "2020-05-28T21:42:36Z", >> >> > >>> > > "conf": {"key": "value"} >> >> > >>> > > } >> >> > >>> > > >> >> > >>> > > i.e. should the schema/type of conf be a (JSON-encoded) string, >> >> > >>> or an >> >> > >>> > object. >> >> > >>> > > >> >> > >>> > > I favour the later, Kamil the former. His point is that staticly >> >> typed >> >> > >>> > > languages, and Java in particular, would be hard to represent >> >> this. >> >> > >>> > > (Please correct me if I've over-simplified or >> misunderstood your >> >> > >>> > > argument Kamil) >> >> > >>> > > >> >> > >>> > > Mine was that it's easy enough in Go, for example >> >> trigger(dagRunId >> >> > >>> str, >> >> > >>> > > executionDate *time.Time, conf interface{})`, and double json >> >> encoding >> >> > >>> > > is always messy/a pain to drive manually on cURL etc. >> >> > >>> > > >> >> > >>> > > >> >> > >>> > > (Using dagRun.conf is quite rare right now, doubly so >> via the >> >> > >>> API, so >> >> > >>> I >> >> > >>> > > don't think we have any precendent to follow.) >> >> > >>> > > >> >> > >>> > > Or does anyone feel strongly that we should support >> both, and >> >> have >> >> > >>> this >> >> > >>> > > in the python side >> >> > >>> > > >> >> > >>> > > if conf: >> >> > >>> > > if isinstance(conf, dict): >> >> > >>> > > run_conf = conf >> >> > >>> > > else: >> >> > >>> > > run_conf = json.loads(conf) >> >> > >>> > > >> >> > >>> > > >> >> > >>> > > -ash >> >> > >>> > >> >> > >>> >> >> > >> >> >> > >> >> >> > >> -- >> >> > >> >> >> > >> Jarek Potiuk >> >> > >> Polidea <https://www.polidea.com/> | Principal Software Engineer >> >> > >> >> >> > >> M: +48 660 796 129 <+48660796129> >> >> > >> [image: Polidea] <https://www.polidea.com/> >> >> > >> >> >> > >> >> >> > > >> >> > > -- >> >> > > >> >> > > Jarek Potiuk >> >> > > Polidea <https://www.polidea.com/> | Principal Software Engineer >> >> > > >> >> > > M: +48 660 796 129 <+48660796129> >> >> > > [image: Polidea] <https://www.polidea.com/> >> >> > > >> >> >> > >