potiuk commented on issue #14529: URL: https://github.com/apache/airflow/issues/14529#issuecomment-1133066140
> @potiuk I hope you don't feel uncomfotable on my follow-up questions. Surely I don't mind. And I am not arguing. I am just trying to explain the complexity - precisely because you as a user might not know complexities involved. My role here is to make you aware of it and set some expectations. On the other hand you think things are easy precisely because you do not know the details so I am trying to explain it to you. I hope you can learn something from that - for example that hastily saying that something is "easy" when you have no good understanding is usually very premature. It's fine if you have to yourself follow your own judgment and "pay" the price of mistake in the assesment of difficulty. That's why I very considerately return the discussion back to you. "if you think it is easy - do it" . Plain and simple. This is precisely what I (deliberately) do to many people who I mentored or worked with - when I think something is complex - I explain why and we discuss. If the other person is not convinced I - very consciously and deliberately say t o that persona - ok, if you are not convinced - please feel free to do it and let's see. Sometimes the best inventions are made by people who thought that the things were simple when everyone else thought they are complex. Sometimes in such situation I am positively surprised, sometimes the other person gets to bite the bullet and learn hard ways that things are harder than on the surface. this is precisely such moment - feel free to do it. You will learn. If you do it and it take far less time and turns out to be simpler than I thought (assuming it will pass all the testing, works at scale and so on) - this is even better. I will learn something new, we will get cool feature. I really like when this happens - so by all means - please, do it. Implement it if you think it is simple. This will be great feature to have! And if not - then we have a great source of information on things to look at when someone actually WILL start implementing it - they can see at the points raised here and try to see if their solution addresses them, BTW. I am not sure also if you are aware that the discussion here is pretty academic. Until there will be a person that would like to implement it, it will simply not happen. No matter what. If you want this feature in - there is no-one at the "top" to convince this shoudl be done. There is no-one here who "decides" what should be done. Things are done here - because someone does them. So in a way it does not matter, if we agree or not here - the surest way to get this feature (if you really want it) is ... to implement it. That's it. So my comments are really. - yeah, go ahead and implement it. If you thik it's easy - cool. But I just pointed out the problems you (or anyone else here will have to solve). But answering your question: > Web UI has a feature to clear a task and its all downstream tasks. What happens if there are 100s or 1000s of downstream tasks? Yes. it's very, very, very different. When you clear the tasks which are downstream, you can do it in one big transaction (i.e. qucikly) because they are directly depending on the original task and their state depends directly on the state of your task, so you can clear them mathematically speaking in "topological" order. Task Group has no such notion. Tasks in a task group might be arbitrary connected or not connected they can have various inter-dependencies and you cannot do it reliably because even if the tasks are grouped in the UI they might relate to various parts of the tree with some interdependncies between them and there is no clear topological order you can clear all the tasks in single transaction. Rather than in direct downstream dependency, there might be multiple levels of hiearchy - i.e. you can have a task hierarchy: A -> B -> C where A and C belong to the task group and B does not. What happens when you clear group - clearing A might in turn cause transitive clearing of B which in turn might cause "starting" of C (depending on the type of dependency), but then you also attempt to clear C at the same time. Should the C task be started or cleared in this case? Are there other cases like that? Likely. That's why I tihnk - trying to do it in one big transaction is likely to lead to deadlocks. But if you think otherwise and thought that through and have answers to all the possible edge-cases here - by all means, please implement it. Make a PR. Let's review it :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
