Do I correctly understand that users can specify conf="[1,2,3]" as this is also a valid JSON?
T. On Fri, May 29, 2020 at 4:18 PM Kamil Breguła <kamil.breg...@polidea.com> wrote: > The problem is with mixing objects that have a known structure with > objects that do not have this structure. Common JSON parser in Java > can generate objects if all the attributes have a known type. > Attributes that are unrecognized are ignored because Java cannot > dynamically generate classes. This problem is very visible when > reading a list of objects from the API. > > For example. We have the following DAGRun. > DagRun(conf='{"A":"B"}') > DagRun(conf='{"C":"D"}') > They can be represented by the following JSON objects. > {"conf": {"A":"B"}} > {"conf": {"C":"D"}} > > A type representing the DAG Run is automatically generated based on > the API specification. I prepared an example that shows such a type in > many different technologies for AJava > I used official library generators for OpenAPI > https://github.com/OpenAPITools/openapi-generator > Example class definitions > > https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-okhttp-gson/src/main/java/org/openapitools/client/model/DAGRun.java#L64 > > https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-feign/src/main/java/org/openapitools/client/model/DAGRun.java#L65 > > https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-jersey1/src/main/java/org/openapitools/client/model/DAGRun.java#L65 > > https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-jersey2/src/main/java/org/openapitools/client/model/DAGRun.java#L65 > > https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-microprofile/src/main/java/org/openapitools/client/model/DAGRun.java#L64 > > https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-native/src/main/java/org/openapitools/client/model/DAGRun.java#L65 > > https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-okhttp-gson/src/main/java/org/openapitools/client/model/DAGRun.java#L63 > > https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-rest-assured/src/main/java/org/openapitools/client/model/DAGRun.java#L64 > > https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-resteasy/src/main/java/org/openapitools/client/model/DAGRun.java#L65 > > https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-resttemplate/src/main/java/org/openapitools/client/model/DAGRun.java#L65 > > https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-retrofit2/src/main/java/org/openapitools/client/model/DAGRun.java#L63 > > https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-vertx/src/main/java/org/openapitools/client/model/DAGRun.java#L65 > > https://github.com/mik-laj/airflow-api-clients/blob/master/out/object/java-webclient/src/main/java/org/openapitools/client/model/DAGRun.java#L64 > None allowed to generate a class structure that allows reading the > attributes contained in the conf field if type == object. > > Libraries for other languages and with different fields are also > available in the repository. > > As an object: > Patch: > https://github.com/mik-laj/airflow-api-clients/blob/master/spec-object.patch > Libraries: > https://github.com/mik-laj/airflow-api-clients/tree/master/out/object > > Two field types at the same time: > Patch: > https://github.com/mik-laj/airflow-api-clients/blob/master/spec-oneoff.patch > Libraries: > https://github.com/mik-laj/airflow-api-clients/tree/master/out/oneoff > > Without modification (string): > Libraries: > https://github.com/mik-laj/airflow-api-clients/tree/master/out/string > > > I have the biggest concerns about reading data, not about sending data > so in my opinion we should think about how we want the response from > the server to look like? > > GET //dags/{DAG_ID}/dagRuns/ > > { > "dag_runs": [ > {"conf": {"A":"B"}} > {"conf": {"C":"D"}} > ] > } > > or > "dag_runs": [ > {"conf": "{\"A\":\"B\"}} > {"conf": "{\"C\":\"D\"}} > ] > } > > If we choose one output format, we can think about whether we want to > make changes to this object structure so that the input object does > not match the output object. In my opinion, this is not a good idea. > REST is based on resources/types that have a unified interface and > fields that are ambiguous can be problematic for users. > > In my opinion, the problem is that the field format and the message > format is JSON. This is just a coincidence. In the future, we may add > a different message format, e.g. protobuff, and we will not be able to > provide common schema. We will have to introduce discrepancies between > spec for protobuff and spec for JSON when we change only transform > format. > > On Fri, May 29, 2020 at 12:08 PM Ash Berlin-Taylor <a...@apache.org> wrote: > > > > On May 29 2020, at 8:29 am, Jarek Potiuk <jarek.pot...@polidea.com> > wrote: > > > *Root problem* > > > > > > I think the root problem is not in the interface/API but the ambiguity > of > > > the DagRun.conf field. The way it is defined now, It can actually be an > > > arbitrary object, not even a dictionary. > > > > So that is a gap in the definition. It _needs_ to be a dictionary/JSON > > *Object* specifically, else it will break when it is used. > > > > > but in case of the API, it is just a tool that is used to pass the > > > data, and there should be some well-defined and fixed logic about the > > > structure > > > > My view is that specifying it as a JSON object in the spec (but saying > > nothing about the properties it might contain) is more precise than a > > string. Compare these two snippets of OpenAPI spec: > > > > > > conf: > > type: string > > description: > > > JSON string (which must contain a top-level JSON object) > > describing additional configuration parameters. > > > > vs > > > > conf: > > type: object > > description: > > > JSON object describing additional configuration parameters. > > > > > > The first one means technically, acording to the spec you can submit `{ > > "conf": "abc" }` -- but that will fail. Where as the second means that > > both the OpenAPI client, and the automatic validation from Connexion on > > the server will handle that for us. > > > > (With the string case, we would have to JSON deserialize it and check > > it's an JSON object/ python dict. Nothing else is allowed.) > > > > > *Java case (and other static-typed languages)* > > > > import com.google.gson.Gson; > > import com.google.gson.JsonObject; > > > > class Demo { > > > > public static class Employee > > { > > private Integer id; > > private String firstName; > > private String lastName; > > > > public Employee(Integer id, String firstName, String lastName){ > > this.id = id; > > this.firstName = firstName; > > this.lastName = lastName; > > } > > } > > > > public static void main(String args[]) { > > > > Demo.Employee employee = new Demo.Employee(1, "Ash", > "Berlin-Taylor"); > > > > Gson gson = new Gson(); > > > > JsonObject e = gson.toJsonTree(employee).getAsJsonObject(); > > > > // Just for debugging/testing > > System.out.println(e.toString()); > > > > } > > } > > > > which outputs > > > > {"id":1,"firstName":"Ash","lastName":"Berlin-Taylor"} > > > > > > Entirely possible to create arbitrary JSON structures in Java. (You can > > also just create a com.google.gson.JsonObject directly and call `.add()`) > > > > To use your example DagRun class, the signature could become: > > > > public DagRun(String dagRunId, String executionDate, Object conf); > > > > and then the client can do `gson.toJsonTree(conf).getAsJsonObject()` > internally. > > > > > > So I'm still +1 for dict directly. Even more so now I have written this > > in Java. > > > > JSON-encoded-string in a JSON api is a "code smell" to me. > > > > -ash > > > > > > On May 29 2020, at 8:29 am, Jarek Potiuk <jarek.pot...@polidea.com> > wrote: > > > > > And one more comment to add - this is the DagRun class I wrote for the > > > imaginary Java client. That's all (including the serializer from the > > > previous mail) needed to be able to easily build the POST method call > in > > > Java. > > > I would really like to challenge someone more experienced with Java > writes > > > or provides some examples, either the class holding arbitrary complex > conf > > > object or (if we stick to String way of storing the String) - to > > > serialize/deserialise the object from son. > > > > > > I really believe it is far from trivial and by choosing "object" way of > > > sending the conf, we significantly increase the complexity of clients > > > accessing Airflow API (which should be our goal) at the expense of > > > "slightly" less readable code. > > > > > > J. > > > > > > > > > public static class DagRun { > > > private final String dagRunId; > > > private final String executionDate; > > > private final String conf; > > > > > > public String getDagRunId() { > > > return dagRunId; > > > } > > > > > > public String getExecutionDate() { > > > return executionDate; > > > } > > > > > > public String getConf() { > > > return conf; > > > } > > > > > > public DagRun(String dagRunId, String executionDate, String > conf){ > > > this.dagRunId = dagRunId; > > > this.executionDate =executionDate; > > > this.conf = conf; > > > } > > > } > > > > > > J. > > > > > > > > > On Fri, May 29, 2020 at 8:40 AM Jarek Potiuk <jarek.pot...@polidea.com > > > > > wrote: > > > > > >> After giving it quite some time to try and think about it this > morning, > > >> and looking at consequences - I am in strong favour of passing string > as > > >> conf (Kamil's proposal). > > >> > > >> I don't think the dictionary is good. And trying to accommodate both > > >> is I > > >> think combining the worst of both worlds. Let me explain why. > > >> > > >> *Root problem* > > >> > > >> I think the root problem is not in the interface/API but the > > >> ambiguity of > > >> the DagRun.conf field. The way it is defined now, It can actually be > an > > >> arbitrary object, not even a dictionary. I could pass it an array, or > > >> bool, or pretty much any serializable object it seems. Except for a > few > > >> tests where both "string" and "dict" are accepted in the old > experimental > > >> API I do not see anywhere (please correct me if I am wrong) any kind > of > > >> specification for the conf object. That indeed makes it rather hard to > > >> reason about it in statically typed languages. And while I understand > why > > >> we need it and why in Python environment we are fairly relaxed about > this > > >> (JINJA for example) - I do not think this should influence the > complexity > > >> of the API "structure". > > >> > > >> *API Structure: * > > >> > > >> I think JSON structure in the API should be fixed and well defined. I > > >> think it is much better to be very explicit about the "conf" > > >> parameter that > > >> it is "string JSON representation" than the arbitrary object. JSON > > >> is, of > > >> course, dynamic but in case of the API, it is just a tool that is > > >> used to > > >> pass the data, and there should be some well-defined and fixed logic > about > > >> the structure.Precisely to make it easier to parse and prepare. I > > >> have a > > >> feeling that putting that dynamic nature of JSON into API structure > > >> definition is quite an abuse of that dynamic nature. > > >> > > >> *Java case (and other static-typed languages)* > > >> > > >> I haven't written Java for years, but I decided to give it a try. I > tried > > >> to see how complex it would be to write serialization/deserialization > for > > >> that using one of the most common parsers in Java world - Gson. > > >> > > >> The string case is super simple: > > >> > > >> public JsonElement serialize(final DagRun dagRun, final Type > type, > > >> final JsonSerializationContext context) { > > >> JsonObject result = new JsonObject(); > > >> result.add("dag_run_id", new > > >> JsonPrimitive(dagRun.getDagRunId())); > > >> result.add("execution_date", new > > >> JsonPrimitive(dagRun.getExecutionDate())); > > >> result.add("conf",new JsonPrimitive(dagRun.getConf())); > > >> return result; > > >> } > > >> > > >> The dynamic object of arbitrary complexity I do not even know how to > > >> approach. Eventually what I would have to do is to convert it to JSON > > >> String anyway - because that's the only way you can keep arbitrary > complex > > >> structure in Java. > > >> > > >> *Deferring problem?* > > >> > > >> Also - I do not think we are deferring the problem for later. The > > >> thing is > > >> that the only entity that cares about the "content" of the conf being > > >> accessible as an object is the Python DAG reading the conf object > (likely > > >> with some JINJA templates). > > >> > > >> We will likely never, ever have to parse, de-jsonize the string on > the > > >> client-side. We will just have to prepare it (to send) and possibly > display > > >> it (if we ever read that conf via API). > > >> The imaginary client communicating with Airflow will simply pass > whatever > > >> the User will tell it to do. And IMHO this is far easier if we do not > have > > >> to convert it to object on the flight . > > >> > > >> I can imagine several use cases for that method: > > >> > > >> 1) User types - by hand - the whole "conf" object to pass to the > trigger > > >> method. Likely typing JSON-string directly. That's a typical case for > any > > >> kind of CLI I can imagine - where usually you pass JSON string for > > >> arbitrary complex objects. > > >> > > >> 2) The client-side implementation will define some limited set of > > >> parameters that could be used by the user and let the user enter it > > >> via a > > >> form. Based on that a "conf" object will be created following > predefined > > >> structure (specific for this case). For example, the user chooses a > > >> date to > > >> run, and the form produces {"date: "${DATE_CHOSEN_BY_THE_USER}" } > object. > > >> > > >> 3) Theoretically, it's possible that user enters arbitrary complex > OBJECT > > >> via any kind of structured "generic" interface. That would be a > nightmare, > > >> however (both from the user and developer point of view). So I > disregard > > >> that option. > > >> > > >> In case of 1) we would have to parse the data ??? and turn it into > > >> object to serialize. Python can do that, but Java can't easily. And > > >> well - > > >> there is no point in doing it - we cannot do anything with conf as > object, > > >> we have no idea if it is valid or not, we have to anyhow pass it to > > >> DAG so > > >> that DAG can access whatever fields are needed. I think we would be > better > > >> to pass it as the very string user entered (After we check if this is > a > > >> valid json - which we can do easily). > > >> > > >> In case of 2) it's also super easy to turn such pre-defined structure > into > > >> JSON string. It's trivial in any language. There is virtually no > > >> benefit of > > >> passing it as object. - the "slightly" better readabilty maybe the > > >> only one > > >> > > >> Of course - as Ash and Kaxil mentioned, we could pass both - string or > > >> dict, but I think that is very wrong. Should we also allow array > > >> (this is a > > >> valid json structure)? Should we allow passing standalone bool, int > .. > > >> objects (they are valid json). Also how about sending "STRING" as > conf > > >> value (is it string or is it json-encoded object). This is a bad, bad > idea. > > >> > > >> So summarizing - I am strongly for passing "string" rather than > object. > > >> > > >> J. > > >> > > >> > > >> On Fri, May 29, 2020 at 1:58 AM Kaxil Naik <kaxiln...@gmail.com> > wrote: > > >> > > >>> If the only problem is with Java and not any other popular > > >>> languages, I > > >>> would say we go for Option (2). > > >>> > > >>> If not, supporting both is a good idea. > > >>> > > >>> Regards, > > >>> Kaxil > > >>> > > >>> On Fri, May 29, 2020 at 12:19 AM QP Hou <q...@scribd.com> wrote: > > >>> > > >>> > While I understand the difficulty of dealing with nested json > without > > >>> > predefined schemas, I feel like returning it as a string only > delays > > >>> > the problem to a later stage since the user will still need to > parse > > >>> > that string into a strongly typed data structure in order to read > the > > >>> > values. > > >>> > > > >>> > I don't have much experience in Java so I can't really comment on > > >>> > that. But I can confirm that it's pretty straightforward to deal > with > > >>> > this in C/C++, Rust and Go. > > >>> > > > >>> > On Thu, May 28, 2020 at 2:57 PM Ash Berlin-Taylor <a...@apache.org> > > >>> wrote: > > >>> > > > > >>> > > Hi everyone, > > >>> > > > > >>> > > We're really close to getting the OpenAPI spec merged, just one > last > > >>> > > question that's come up around how we should handle/represent > > >>> > > dagrun.conf to triggerDagRun. > > >>> > > > > >>> > > Which of the these two do people prefer? > > >>> > > > > >>> > > > > >>> > > POST /api/v1/dags/{dag_id}/dagRuns/{dag_run_id} > > >>> > > Content-Type: application/json > > >>> > > > > >>> > > { > > >>> > > "dag_run_id": "manual_2020-05-28T21:42:36Z", > > >>> > > "execution_date": "2020-05-28T21:42:36Z", > > >>> > > "conf": "{\"key\": \"value\" }" > > >>> > > } > > >>> > > > > >>> > > OR > > >>> > > > > >>> > > > > >>> > > POST /api/v1/dags/{dag_id}/dagRuns/{dag_run_id} > > >>> > > Content-Type: application/json > > >>> > > > > >>> > > { > > >>> > > "dag_run_id": "manual_2020-05-28T21:42:36Z", > > >>> > > "execution_date": "2020-05-28T21:42:36Z", > > >>> > > "conf": {"key": "value"} > > >>> > > } > > >>> > > > > >>> > > i.e. should the schema/type of conf be a (JSON-encoded) string, > > >>> or an > > >>> > object. > > >>> > > > > >>> > > I favour the later, Kamil the former. His point is that staticly > typed > > >>> > > languages, and Java in particular, would be hard to represent > this. > > >>> > > (Please correct me if I've over-simplified or misunderstood your > > >>> > > argument Kamil) > > >>> > > > > >>> > > Mine was that it's easy enough in Go, for example > trigger(dagRunId > > >>> str, > > >>> > > executionDate *time.Time, conf interface{})`, and double json > encoding > > >>> > > is always messy/a pain to drive manually on cURL etc. > > >>> > > > > >>> > > > > >>> > > (Using dagRun.conf is quite rare right now, doubly so via the > > >>> API, so > > >>> I > > >>> > > don't think we have any precendent to follow.) > > >>> > > > > >>> > > Or does anyone feel strongly that we should support both, and > have > > >>> this > > >>> > > in the python side > > >>> > > > > >>> > > if conf: > > >>> > > if isinstance(conf, dict): > > >>> > > run_conf = conf > > >>> > > else: > > >>> > > run_conf = json.loads(conf) > > >>> > > > > >>> > > > > >>> > > -ash > > >>> > > > >>> > > >> > > >> > > >> -- > > >> > > >> Jarek Potiuk > > >> Polidea <https://www.polidea.com/> | Principal Software Engineer > > >> > > >> M: +48 660 796 129 <+48660796129> > > >> [image: Polidea] <https://www.polidea.com/> > > >> > > >> > > > > > > -- > > > > > > Jarek Potiuk > > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > > > M: +48 660 796 129 <+48660796129> > > > [image: Polidea] <https://www.polidea.com/> > > > >