Why? Do we have any recommendations / docs for this feature? All I can
find is "JSON string that gets pickled into the DagRun’s conf
attribute".


T.


On Fri, May 29, 2020 at 4:33 PM Ash Berlin-Taylor <a...@apache.org> wrote:
>
> That should _not_ be allowed, it must be a dict/object. No array, or
> other primitive at the top level.
>
> -ash
>
>
> On May 29 2020, at 3:27 pm, Tomasz Urbaszek <turbas...@apache.org> wrote:
>
> > Do I correctly understand that users can specify conf="[1,2,3]" as
> > this is
> > also a valid JSON?
> >
> > T.
> >
> > On Fri, May 29, 2020 at 4:18 PM Kamil Breguła <kamil.breg...@polidea.com>
> > wrote:
> >
> >> The problem is with mixing objects that have a known structure with
> >> objects that do not have this structure.  Common JSON parser in Java
> >> can generate objects if all the attributes have a known type.
> >> Attributes that are unrecognized are ignored because Java cannot
> >> dynamically generate classes. This problem is very visible when
> >> reading a list of objects from the API.
> >>
> >> For example. We have the following DAGRun.
> >> DagRun(conf='{"A":"B"}')
> >> DagRun(conf='{"C":"D"}')
> >> They can be represented by the following JSON objects.
> >> {"conf": {"A":"B"}}
> >> {"conf": {"C":"D"}}
> >>
> >> A type representing the DAG Run is automatically generated based on
> >> the API specification. I prepared an example that shows such a type in
> >> many different technologies for AJava
> >> I used official library generators for OpenAPI
> >> https://github.com/OpenAPITools/openapi-generator
> >> Example class definitions
> >>
> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-okhttp-gson/src/main/java/org/openapitools/client/model/DAGRun.java#L64
> >>
> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-feign/src/main/java/org/openapitools/client/model/DAGRun.java#L65
> >>
> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-jersey1/src/main/java/org/openapitools/client/model/DAGRun.java#L65
> >>
> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-jersey2/src/main/java/org/openapitools/client/model/DAGRun.java#L65
> >>
> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-microprofile/src/main/java/org/openapitools/client/model/DAGRun.java#L64
> >>
> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-native/src/main/java/org/openapitools/client/model/DAGRun.java#L65
> >>
> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-okhttp-gson/src/main/java/org/openapitools/client/model/DAGRun.java#L63
> >>
> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-rest-assured/src/main/java/org/openapitools/client/model/DAGRun.java#L64
> >>
> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-resteasy/src/main/java/org/openapitools/client/model/DAGRun.java#L65
> >>
> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-resttemplate/src/main/java/org/openapitools/client/model/DAGRun.java#L65
> >>
> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-retrofit2/src/main/java/org/openapitools/client/model/DAGRun.java#L63
> >>
> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-vertx/src/main/java/org/openapitools/client/model/DAGRun.java#L65
> >>
> >> https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-webclient/src/main/java/org/openapitools/client/model/DAGRun.java#L64
> >> None allowed to generate a class structure that allows reading the
> >> attributes contained in the conf field if type == object.
> >>
> >> Libraries for other languages and with different fields are also
> >> available in the repository.
> >>
> >> As an object:
> >> Patch:
> >> https://github.com/mik-laj/airflow-api-clients/blob/master/spec-object.patch
> >> Libraries:
> >> https://github.com/mik-laj/airflow-api-clients/tree/master/out/object
> >>
> >> Two field types at the same time:
> >> Patch:
> >> https://github.com/mik-laj/airflow-api-clients/blob/master/spec-oneoff.patch
> >> Libraries:
> >> https://github.com/mik-laj/airflow-api-clients/tree/master/out/oneoff
> >>
> >> Without modification (string):
> >> Libraries:
> >> https://github.com/mik-laj/airflow-api-clients/tree/master/out/string
> >>
> >>
> >> I have the biggest concerns about reading data, not about sending data
> >> so in my opinion we should think about how we want the response from
> >> the server to look like?
> >>
> >> GET //dags/{DAG_ID}/dagRuns/
> >>
> >> {
> >> "dag_runs": [
> >> {"conf": {"A":"B"}}
> >> {"conf": {"C":"D"}}
> >> ]
> >> }
> >>
> >> or
> >> "dag_runs": [
> >> {"conf": "{\"A\":\"B\"}}
> >> {"conf": "{\"C\":\"D\"}}
> >> ]
> >> }
> >>
> >> If we choose one output format, we can think about whether we want to
> >> make changes to this object structure so that the input object does
> >> not match the output object.  In my opinion, this is not a good idea.
> >> REST is based on resources/types that have a unified interface and
> >> fields that are ambiguous can be problematic for users.
> >>
> >> In my opinion, the problem is that the field format and the message
> >> format is JSON. This is just a coincidence. In the future, we may add
> >> a different message format, e.g. protobuff, and we will not be able to
> >> provide common schema. We will have to introduce discrepancies between
> >> spec for protobuff and spec for JSON when we change only transform
> >> format.
> >>
> >> On Fri, May 29, 2020 at 12:08 PM Ash Berlin-Taylor <a...@apache.org> wrote:
> >> >
> >> > On May 29 2020, at 8:29 am, Jarek Potiuk <jarek.pot...@polidea.com>
> >> wrote:
> >> > > *Root problem*
> >> > >
> >> > > I think the root problem is not in the interface/API but the ambiguity
> >> of
> >> > > the DagRun.conf field. The way it is defined now, It can actually
> >> be an
> >> > > arbitrary object, not even a dictionary.
> >> >
> >> > So that is a gap in the definition. It _needs_ to be a dictionary/JSON
> >> > *Object* specifically, else it will break when it is used.
> >> >
> >> > > but in case of the API, it is just a tool that is used to pass the
> >> > > data, and there should be some well-defined and fixed logic about the
> >> > > structure
> >> >
> >> > My view is that specifying it as a JSON object in the spec (but saying
> >> > nothing about the properties it might contain) is more precise than a
> >> > string. Compare these two snippets of OpenAPI spec:
> >> >
> >> >
> >> >         conf:
> >> >           type: string
> >> >           description: >
> >> >             JSON string (which must contain a top-level JSON object)
> >> > describing additional configuration parameters.
> >> >
> >> > vs
> >> >
> >> >         conf:
> >> >           type: object
> >> >           description: >
> >> >             JSON object describing additional configuration parameters.
> >> >
> >> >
> >> > The first one means technically, acording to the spec you can
> >> submit `{
> >> > "conf": "abc" }` -- but that will fail. Where as the second means that
> >> > both the OpenAPI client, and the automatic validation from
> >> Connexion on
> >> > the server will handle that for us.
> >> >
> >> > (With the string case, we would have to JSON deserialize it and check
> >> > it's an JSON object/ python dict. Nothing else is allowed.)
> >> >
> >> > > *Java case (and other static-typed languages)*
> >> >
> >> > import com.google.gson.Gson;
> >> > import com.google.gson.JsonObject;
> >> >
> >> > class Demo {
> >> >
> >> >   public static class Employee
> >> >   {
> >> >      private Integer id;
> >> >      private String firstName;
> >> >      private String lastName;
> >> >
> >> >      public Employee(Integer id, String firstName, String lastName){
> >> >         this.id = id;
> >> >         this.firstName = firstName;
> >> >         this.lastName = lastName;
> >> >      }
> >> >   }
> >> >
> >> >   public static void main(String args[]) {
> >> >
> >> >     Demo.Employee employee = new Demo.Employee(1, "Ash",
> >> "Berlin-Taylor");
> >> >
> >> >     Gson gson = new Gson();
> >> >
> >> >     JsonObject e = gson.toJsonTree(employee).getAsJsonObject();
> >> >
> >> >     // Just for debugging/testing
> >> >     System.out.println(e.toString());
> >> >
> >> >   }
> >> > }
> >> >
> >> > which outputs
> >> >
> >> >   {"id":1,"firstName":"Ash","lastName":"Berlin-Taylor"}
> >> >
> >> >
> >> > Entirely possible to create arbitrary JSON structures in Java. (You can
> >> > also just create a com.google.gson.JsonObject directly and call `.add()`)
> >> >
> >> > To use your example DagRun class, the signature could become:
> >> >
> >> >   public DagRun(String dagRunId, String executionDate, Object conf);
> >> >
> >> > and then the client can do `gson.toJsonTree(conf).getAsJsonObject()`
> >> internally.
> >> >
> >> >
> >> > So I'm still +1 for dict directly. Even more so now I have written this
> >> > in Java.
> >> >
> >> > JSON-encoded-string in a JSON api is a "code smell" to me.
> >> >
> >> > -ash
> >> >
> >> >
> >> > On May 29 2020, at 8:29 am, Jarek Potiuk <jarek.pot...@polidea.com>
> >> wrote:
> >> >
> >> > > And one more comment to add - this is the DagRun class I wrote
> >> for the
> >> > > imaginary Java client. That's all (including the serializer from the
> >> > > previous mail) needed to be able to easily  build the POST method call
> >> in
> >> > > Java.
> >> > > I would really like to challenge someone more experienced with Java
> >> writes
> >> > > or provides some examples, either the class holding arbitrary complex
> >> conf
> >> > > object or (if we stick to String way of storing the String) - to
> >> > > serialize/deserialise the object from son.
> >> > >
> >> > > I really believe it is far from trivial and by choosing "object"
> >> way of
> >> > > sending the conf, we significantly increase the complexity of clients
> >> > > accessing Airflow API (which should be our goal) at the expense of
> >> > > "slightly" less readable code.
> >> > >
> >> > > J.
> >> > >
> >> > >
> >> > >    public static class DagRun {
> >> > >        private final String dagRunId;
> >> > >        private final String executionDate;
> >> > >        private final String conf;
> >> > >
> >> > >        public String getDagRunId() {
> >> > >            return dagRunId;
> >> > >        }
> >> > >
> >> > >        public String getExecutionDate() {
> >> > >            return executionDate;
> >> > >        }
> >> > >
> >> > >        public String getConf() {
> >> > >            return conf;
> >> > >        }
> >> > >
> >> > >        public DagRun(String dagRunId, String executionDate, String
> >> conf){
> >> > >            this.dagRunId = dagRunId;
> >> > >            this.executionDate =executionDate;
> >> > >            this.conf = conf;
> >> > >        }
> >> > >    }
> >> > >
> >> > > J.
> >> > >
> >> > >
> >> > > On Fri, May 29, 2020 at 8:40 AM Jarek Potiuk <jarek.pot...@polidea.com
> >> >
> >> > > wrote:
> >> > >
> >> > >> After giving it quite some time to try and think about it this
> >> morning,
> >> > >> and looking at consequences - I am in strong favour of passing string
> >> as
> >> > >> conf (Kamil's proposal).
> >> > >>
> >> > >> I don't think the dictionary is good. And trying to accommodate both
> >> > >> is I
> >> > >> think combining the worst of both worlds. Let me explain why.
> >> > >>
> >> > >> *Root problem*
> >> > >>
> >> > >> I think the root problem is not in the interface/API but the
> >> > >> ambiguity of
> >> > >> the DagRun.conf field.  The way it is defined now, It can
> >> actually be
> >> an
> >> > >> arbitrary object, not even a dictionary. I could pass it an
> >> array,  or
> >> > >> bool, or pretty much any serializable object it seems. Except
> >> for a
> >> few
> >> > >> tests where both "string" and "dict" are accepted in the old
> >> experimental
> >> > >> API I do not see anywhere (please correct me if I am wrong) any kind
> >> of
> >> > >> specification for the conf object. That indeed makes it rather
> >> hard to
> >> > >> reason about it in statically typed languages. And while I understand
> >> why
> >> > >> we need it and why in Python environment we are fairly relaxed about
> >> this
> >> > >> (JINJA for example) - I do not think this should influence the
> >> complexity
> >> > >> of the API "structure".
> >> > >>
> >> > >> *API Structure: *
> >> > >>
> >> > >> I think JSON structure in the API should be fixed and well
> >> defined. I
> >> > >> think it is much better to be very explicit about the "conf"
> >> > >> parameter that
> >> > >> it is "string JSON representation" than the arbitrary object. JSON
> >> > >> is, of
> >> > >> course, dynamic but in case of the API, it is just a tool that is
> >> > >> used to
> >> > >> pass the data, and there should be some well-defined and fixed logic
> >> about
> >> > >> the structure.Precisely to make it easier to parse and prepare.  I
> >> > >> have a
> >> > >> feeling that putting that dynamic nature of JSON into API structure
> >> > >> definition is quite an abuse of that dynamic nature.
> >> > >>
> >> > >> *Java case (and other static-typed languages)*
> >> > >>
> >> > >> I haven't written Java for years, but I decided to give it a
> >> try. I
> >> tried
> >> > >> to see how complex it would be to write serialization/deserialization
> >> for
> >> > >> that using one of the most common parsers in Java world - Gson.
> >> > >>
> >> > >> The string case is super simple:
> >> > >>
> >> > >>         public JsonElement serialize(final DagRun dagRun, final Type
> >> type,
> >> > >> final JsonSerializationContext context) {
> >> > >>             JsonObject result = new JsonObject();
> >> > >>             result.add("dag_run_id", new
> >> > >> JsonPrimitive(dagRun.getDagRunId()));
> >> > >>             result.add("execution_date", new
> >> > >> JsonPrimitive(dagRun.getExecutionDate()));
> >> > >>             result.add("conf",new JsonPrimitive(dagRun.getConf()));
> >> > >>             return result;
> >> > >>         }
> >> > >>
> >> > >> The dynamic object of arbitrary complexity I do not even know
> >> how to
> >> > >> approach. Eventually what I would have to do is to convert it to JSON
> >> > >> String anyway - because that's the only way you can keep arbitrary
> >> complex
> >> > >> structure in Java.
> >> > >>
> >> > >> *Deferring problem?*
> >> > >>
> >> > >> Also - I do not think we are deferring the problem for later. The
> >> > >> thing is
> >> > >> that the only entity that cares about the "content" of the conf being
> >> > >> accessible as an object is the Python DAG reading the conf object
> >> (likely
> >> > >> with some JINJA templates).
> >> > >>
> >> > >> We will likely never, ever have to parse,  de-jsonize the string on
> >> the
> >> > >> client-side. We will just have to prepare it (to send) and possibly
> >> display
> >> > >> it (if we ever read that conf via API).
> >> > >> The imaginary client communicating with Airflow will simply pass
> >> whatever
> >> > >> the User will tell it to do. And IMHO this is far easier if we
> >> do not
> >> have
> >> > >> to convert it to object on the flight .
> >> > >>
> >> > >> I can imagine several use cases for that method:
> >> > >>
> >> > >> 1) User types - by hand - the whole "conf" object to pass to the
> >> trigger
> >> > >> method. Likely typing JSON-string directly. That's a typical
> >> case for
> >> any
> >> > >> kind of CLI I can imagine - where usually you pass JSON string for
> >> > >> arbitrary complex objects.
> >> > >>
> >> > >> 2) The client-side implementation will define some limited set of
> >> > >> parameters that could be used by the user and let the user enter it
> >> > >> via a
> >> > >> form. Based on that a "conf" object will be created following
> >> predefined
> >> > >> structure (specific for this case). For example, the user
> >> chooses a
> >> > >> date to
> >> > >> run, and the form produces  {"date: "${DATE_CHOSEN_BY_THE_USER}" }
> >> object.
> >> > >>
> >> > >> 3) Theoretically, it's possible that user enters arbitrary complex
> >> OBJECT
> >> > >> via any kind of structured "generic" interface. That would be a
> >> nightmare,
> >> > >> however (both from the user and developer point of view). So I
> >> disregard
> >> > >> that option.
> >> > >>
> >> > >> In case of 1) we would have to parse the data ??? and turn it into
> >> > >> object to serialize. Python can do that, but Java can't easily. And
> >> > >> well -
> >> > >> there is no point in doing it - we cannot do anything with conf as
> >> object,
> >> > >> we have no idea if it is valid or not, we have to anyhow pass it to
> >> > >> DAG so
> >> > >> that DAG can access whatever fields are needed. I think we would be
> >> better
> >> > >> to pass it as the very string user entered (After we check if
> >> this is
> >> a
> >> > >> valid json - which we can do easily).
> >> > >>
> >> > >> In case of 2) it's also super easy to turn such pre-defined structure
> >> into
> >> > >> JSON string. It's trivial in any language. There is virtually no
> >> > >> benefit of
> >> > >> passing it as object. - the "slightly" better readabilty maybe the
> >> > >> only one
> >> > >>
> >> > >> Of course - as Ash and Kaxil mentioned, we could pass both -
> >> string or
> >> > >> dict, but I think that is very wrong. Should we also allow array
> >> > >> (this is a
> >> > >> valid json structure)?  Should we allow passing standalone bool, int
> >> ..
> >> > >> objects (they are valid json).  Also how about sending "STRING" as
> >> conf
> >> > >> value (is it string or is it json-encoded object). This is a
> >> bad, bad
> >> idea.
> >> > >>
> >> > >> So summarizing - I am strongly for passing "string" rather than
> >> object.
> >> > >>
> >> > >> J.
> >> > >>
> >> > >>
> >> > >> On Fri, May 29, 2020 at 1:58 AM Kaxil Naik <kaxiln...@gmail.com>
> >> wrote:
> >> > >>
> >> > >>> If the only problem is with Java and not any other popular
> >> > >>> languages, I
> >> > >>> would say we go for Option (2).
> >> > >>>
> >> > >>> If not, supporting both is a good idea.
> >> > >>>
> >> > >>> Regards,
> >> > >>> Kaxil
> >> > >>>
> >> > >>> On Fri, May 29, 2020 at 12:19 AM QP Hou <q...@scribd.com> wrote:
> >> > >>>
> >> > >>> > While I understand the difficulty of dealing with nested json
> >> without
> >> > >>> > predefined schemas, I feel like returning it as a string only
> >> delays
> >> > >>> > the problem to a later stage since the user will still need to
> >> parse
> >> > >>> > that string into a strongly typed data structure in order to read
> >> the
> >> > >>> > values.
> >> > >>> >
> >> > >>> > I don't have much experience in Java so I can't really
> >> comment on
> >> > >>> > that. But I can confirm that it's pretty straightforward to deal
> >> with
> >> > >>> > this in C/C++, Rust and Go.
> >> > >>> >
> >> > >>> > On Thu, May 28, 2020 at 2:57 PM Ash Berlin-Taylor <a...@apache.org>
> >> > >>> wrote:
> >> > >>> > >
> >> > >>> > > Hi everyone,
> >> > >>> > >
> >> > >>> > > We're really close to getting the OpenAPI spec merged, just one
> >> last
> >> > >>> > > question that's come up around how we should handle/represent
> >> > >>> > > dagrun.conf to triggerDagRun.
> >> > >>> > >
> >> > >>> > > Which of the these two do people prefer?
> >> > >>> > >
> >> > >>> > >
> >> > >>> > > POST /api/v1/dags/{dag_id}/dagRuns/{dag_run_id}
> >> > >>> > > Content-Type: application/json
> >> > >>> > >
> >> > >>> > > {
> >> > >>> > >   "dag_run_id": "manual_2020-05-28T21:42:36Z",
> >> > >>> > >   "execution_date": "2020-05-28T21:42:36Z",
> >> > >>> > >   "conf": "{\"key\": \"value\" }"
> >> > >>> > > }
> >> > >>> > >
> >> > >>> > > OR
> >> > >>> > >
> >> > >>> > >
> >> > >>> > > POST /api/v1/dags/{dag_id}/dagRuns/{dag_run_id}
> >> > >>> > > Content-Type: application/json
> >> > >>> > >
> >> > >>> > > {
> >> > >>> > >   "dag_run_id": "manual_2020-05-28T21:42:36Z",
> >> > >>> > >   "execution_date": "2020-05-28T21:42:36Z",
> >> > >>> > >   "conf": {"key": "value"}
> >> > >>> > > }
> >> > >>> > >
> >> > >>> > > i.e. should the schema/type of conf be a (JSON-encoded) string,
> >> > >>> or an
> >> > >>> > object.
> >> > >>> > >
> >> > >>> > > I favour the later, Kamil the former. His point is that staticly
> >> typed
> >> > >>> > > languages, and Java in particular, would be hard to represent
> >> this.
> >> > >>> > > (Please correct me if I've over-simplified or misunderstood your
> >> > >>> > > argument Kamil)
> >> > >>> > >
> >> > >>> > > Mine was that it's easy enough in Go, for example
> >> trigger(dagRunId
> >> > >>> str,
> >> > >>> > > executionDate *time.Time, conf interface{})`, and double json
> >> encoding
> >> > >>> > > is always messy/a pain to drive manually on cURL etc.
> >> > >>> > >
> >> > >>> > >
> >> > >>> > > (Using dagRun.conf is quite rare right now, doubly so via the
> >> > >>> API, so
> >> > >>> I
> >> > >>> > > don't think we have any precendent to follow.)
> >> > >>> > >
> >> > >>> > > Or does anyone feel strongly that we should support both, and
> >> have
> >> > >>> this
> >> > >>> > > in the python side
> >> > >>> > >
> >> > >>> > >     if conf:
> >> > >>> > >         if isinstance(conf, dict):
> >> > >>> > >             run_conf = conf
> >> > >>> > >         else:
> >> > >>> > >             run_conf = json.loads(conf)
> >> > >>> > >
> >> > >>> > >
> >> > >>> > > -ash
> >> > >>> >
> >> > >>>
> >> > >>
> >> > >>
> >> > >> --
> >> > >>
> >> > >> Jarek Potiuk
> >> > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >> > >>
> >> > >> M: +48 660 796 129 <+48660796129>
> >> > >> [image: Polidea] <https://www.polidea.com/>
> >> > >>
> >> > >>
> >> > >
> >> > > --
> >> > >
> >> > > Jarek Potiuk
> >> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >> > >
> >> > > M: +48 660 796 129 <+48660796129>
> >> > > [image: Polidea] <https://www.polidea.com/>
> >> > >
> >>
> >

Reply via email to