potiuk commented on PR #29821: URL: https://github.com/apache/airflow/pull/29821#issuecomment-1802925162
@gdavoidan - what I proposed and I believe @hussein-awala agrees with is that you define which fields of BaseOperator you exclude from the check. This is what my proposal is about. Find out which of those fields CAN be templated and remove them from the check. Basically allow-list of fields you want to allow templating. Rather than allow all by default. There is no need "specify" them as extra field - just hard-code them. What @hussein-awala proposes is that you attempt to test it and figure out which fields can be templated by templating then one by one and trying to serialize the DAG. because not all of them CAN be serialized and you need to serialize DAG to DB to execute it. See description ogf the original error https://github.com/apache/airflow/issues/29819 whcih explains the reason for doing it. Try to do what the author of the oriignal issue and see it for yourself when you remoge the protection. You don't need to believe anyone's word just try it and read the issues. > This sounds like a workaround for a workaround :) For some reason, you've prohibited the rendering of all the BaseOperator fields (I'm still not convinced of the need for this change), and now you're asking me to exclude some of the fields from that rule (the rule I don't even agree and the rule you can't really justify)... No. You simply didn't dig deep enough. Look at the issue and try to understand what happened there. This is solution to a real problem that overcautiously excluded all the fields. What we are kindly asking you to do is to find out which fields out of those can be templated. Just looking at your examples neither `pool` nor `doc_md` should be allowed to be tempalted. The`pool` field is evaluated by the scheduler when scheduler scheduler tasks for execution, and then when celery worker picks up the task it only selects the tasks belonging to it's queue. And it happens long before template rendering happens - template rendering happpen in the worker of cellery AFTER the worker already used `queue` field to pick tasks. Similary `doc_md` should not be templated, because it is used by the UI to display task details NOT task instance details - and in order to do do it, cannot render the field, because again - rendering happens at execution time by the task, not when the UI decides to display description of the task. Similarly `execution_timeout` cannot be rendered, because it is serialized to the DB by DAG file processor, WAY BEFORE the task gets executed, and there it is stored as INT not string (which is the original issue that this PR solved). And it mak es completely no sense to render `execution_timeout` because again, it is used BEFORE the task starts and rendering is happening in the worker - because execution_timout is applied at the monitoring level of the task in teh process that starts the actual "local task process" where rendering happen. So basically (if you look at the code) SIGARM is started and alarm on timeout is set before JINJA rendering happens. You likely would not like to do: ``` signal.setitimer(signal.ITIMER_REAL, '{{ renerd_timer }}') ``` That simply would not work. So assumption when that PR got merged that most (if not all) of the fields of BaseOperators are like that and all of them should be excluded by default. You seem to find one that might be excluded. Maybe more can be as well. What we are kindly asking you to review all of those, possibly test (as @hussein-awala proposed) and come with PR where you specifically exclude those fields that CAN be allowlisted and you tested that they work. Is it too much of an ask? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
