Option 2 - IMO it’s the path of least surprise. Also, having to encode JSON as 
string can be cumbersome.

Supporting both options is a really no-no to me.

Bas

> On 29 May 2020, at 08:40, Jarek Potiuk <jarek.pot...@polidea.com> wrote:
> 
> After giving it quite some time to try and think about it this morning, and
> looking at consequences - I am in strong favour of passing string as conf
> (Kamil's proposal).
> 
> I don't think the dictionary is good. And trying to accommodate both is I
> think combining the worst of both worlds. Let me explain why.
> 
> *Root problem*
> 
> I think the root problem is not in the interface/API but the ambiguity of
> the DagRun.conf field.  The way it is defined now, It can actually be an
> arbitrary object, not even a dictionary. I could pass it an array,  or
> bool, or pretty much any serializable object it seems. Except for a few
> tests where both "string" and "dict" are accepted in the old experimental
> API I do not see anywhere (please correct me if I am wrong) any kind of
> specification for the conf object. That indeed makes it rather hard to
> reason about it in statically typed languages. And while I understand why
> we need it and why in Python environment we are fairly relaxed about this
> (JINJA for example) - I do not think this should influence the complexity
> of the API "structure".
> 
> *API Structure: *
> 
> I think JSON structure in the API should be fixed and well defined. I think
> it is much better to be very explicit about the "conf" parameter that it is
> "string JSON representation" than the arbitrary object. JSON is, of course,
> dynamic but in case of the API, it is just a tool that is used to pass the
> data, and there should be some well-defined and fixed logic about the
> structure.Precisely to make it easier to parse and prepare.  I have a
> feeling that putting that dynamic nature of JSON into API structure
> definition is quite an abuse of that dynamic nature.
> 
> *Java case (and other static-typed languages)*
> 
> I haven't written Java for years, but I decided to give it a try. I tried
> to see how complex it would be to write serialization/deserialization for
> that using one of the most common parsers in Java world - Gson.
> 
> The string case is super simple:
> 
>        public JsonElement serialize(final DagRun dagRun, final Type type,
> final JsonSerializationContext context) {
>            JsonObject result = new JsonObject();
>            result.add("dag_run_id", new
> JsonPrimitive(dagRun.getDagRunId()));
>            result.add("execution_date", new
> JsonPrimitive(dagRun.getExecutionDate()));
>            result.add("conf",new JsonPrimitive(dagRun.getConf()));
>            return result;
>        }
> 
> The dynamic object of arbitrary complexity I do not even know how to
> approach. Eventually what I would have to do is to convert it to JSON
> String anyway - because that's the only way you can keep arbitrary complex
> structure in Java.
> 
> *Deferring problem?*
> 
> Also - I do not think we are deferring the problem for later. The thing is
> that the only entity that cares about the "content" of the conf being
> accessible as an object is the Python DAG reading the conf object (likely
> with some JINJA templates).
> 
> We will likely never, ever have to parse,  de-jsonize the string on the
> client-side. We will just have to prepare it (to send) and possibly display
> it (if we ever read that conf via API).
> The imaginary client communicating with Airflow will simply pass whatever
> the User will tell it to do. And IMHO this is far easier if we do not have
> to convert it to object on the flight .
> 
> I can imagine several use cases for that method:
> 
> 1) User types - by hand - the whole "conf" object to pass to the trigger
> method. Likely typing JSON-string directly. That's a typical case for any
> kind of CLI I can imagine - where usually you pass JSON string for
> arbitrary complex objects.
> 
> 2) The client-side implementation will define some limited set of
> parameters that could be used by the user and let the user enter it via a
> form. Based on that a "conf" object will be created following predefined
> structure (specific for this case). For example, the user chooses a date to
> run, and the form produces  {"date: "${DATE_CHOSEN_BY_THE_USER}" } object.
> 
> 3) Theoretically, it's possible that user enters arbitrary complex OBJECT
> via any kind of structured "generic" interface. That would be a nightmare,
> however (both from the user and developer point of view). So I disregard
> that option.
> 
> In case of 1) we would have to parse the data ??? and turn it into
> object to serialize. Python can do that, but Java can't easily. And well -
> there is no point in doing it - we cannot do anything with conf as object,
> we have no idea if it is valid or not, we have to anyhow pass it to DAG so
> that DAG can access whatever fields are needed. I think we would be better
> to pass it as the very string user entered (After we check if this is a
> valid json - which we can do easily).
> 
> In case of 2) it's also super easy to turn such pre-defined structure into
> JSON string. It's trivial in any language. There is virtually no benefit of
> passing it as object. - the "slightly" better readabilty maybe the only one
> 
> Of course - as Ash and Kaxil mentioned, we could pass both - string or
> dict, but I think that is very wrong. Should we also allow array (this is a
> valid json structure)?  Should we allow passing standalone bool, int ..
> objects (they are valid json).  Also how about sending "STRING" as conf
> value (is it string or is it json-encoded object). This is a bad, bad idea.
> 
> So summarizing - I am strongly for passing "string" rather than object.
> 
> J.
> 
> 
> On Fri, May 29, 2020 at 1:58 AM Kaxil Naik <kaxiln...@gmail.com> wrote:
> 
>> If the only problem is with Java and not any other popular languages, I
>> would say we go for Option (2).
>> 
>> If not, supporting both is a good idea.
>> 
>> Regards,
>> Kaxil
>> 
>> On Fri, May 29, 2020 at 12:19 AM QP Hou <q...@scribd.com> wrote:
>> 
>>> While I understand the difficulty of dealing with nested json without
>>> predefined schemas, I feel like returning it as a string only delays
>>> the problem to a later stage since the user will still need to parse
>>> that string into a strongly typed data structure in order to read the
>>> values.
>>> 
>>> I don't have much experience in Java so I can't really comment on
>>> that. But I can confirm that it's pretty straightforward to deal with
>>> this in C/C++, Rust and Go.
>>> 
>>> On Thu, May 28, 2020 at 2:57 PM Ash Berlin-Taylor <a...@apache.org>
>> wrote:
>>>> 
>>>> Hi everyone,
>>>> 
>>>> We're really close to getting the OpenAPI spec merged, just one last
>>>> question that's come up around how we should handle/represent
>>>> dagrun.conf to triggerDagRun.
>>>> 
>>>> Which of the these two do people prefer?
>>>> 
>>>> 
>>>> POST /api/v1/dags/{dag_id}/dagRuns/{dag_run_id}
>>>> Content-Type: application/json
>>>> 
>>>> {
>>>>  "dag_run_id": "manual_2020-05-28T21:42:36Z",
>>>>  "execution_date": "2020-05-28T21:42:36Z",
>>>>  "conf": "{\"key\": \"value\" }"
>>>> }
>>>> 
>>>> OR
>>>> 
>>>> 
>>>> POST /api/v1/dags/{dag_id}/dagRuns/{dag_run_id}
>>>> Content-Type: application/json
>>>> 
>>>> {
>>>>  "dag_run_id": "manual_2020-05-28T21:42:36Z",
>>>>  "execution_date": "2020-05-28T21:42:36Z",
>>>>  "conf": {"key": "value"}
>>>> }
>>>> 
>>>> i.e. should the schema/type of conf be a (JSON-encoded) string, or an
>>> object.
>>>> 
>>>> I favour the later, Kamil the former. His point is that staticly typed
>>>> languages, and Java in particular, would be hard to represent this.
>>>> (Please correct me if I've over-simplified or misunderstood your
>>>> argument Kamil)
>>>> 
>>>> Mine was that it's easy enough in Go, for example trigger(dagRunId str,
>>>> executionDate *time.Time, conf interface{})`, and double json encoding
>>>> is always messy/a pain to drive manually on cURL etc.
>>>> 
>>>> 
>>>> (Using dagRun.conf is quite rare right now, doubly so via the API, so I
>>>> don't think we have any precendent to follow.)
>>>> 
>>>> Or does anyone feel strongly that we should support both, and have this
>>>> in the python side
>>>> 
>>>>    if conf:
>>>>        if isinstance(conf, dict):
>>>>            run_conf = conf
>>>>        else:
>>>>            run_conf = json.loads(conf)
>>>> 
>>>> 
>>>> -ash
>>> 
>> 
> 
> 
> -- 
> 
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
> 
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>

Reply via email to