potiuk commented on code in PR #24591: URL: https://github.com/apache/airflow/pull/24591#discussion_r909539094
########## airflow/cli/__main__.py: ########## @@ -18,6 +18,7 @@ # under the License. from airflow.cli import airflow_cmd +from airflow.cli.commands import version # noqa: F401 Review Comment: I think console "accessibility" can be quite easily deferred - it's easy to change later (however it would be great that people who take care about implementing other parts will have an understanding what is our approach of user communication is and how they should write their CLI's - but this does not impact design and implementation approach as much, so it can be done as "next" PR (but it should be done before others get engaged- to make sure we keep consistent approach). But I'd say pre-commit or some other (not necessary pre-commit) gate for not doing unnecessary imports and making sure our CLI commands are optimized for the way how click works are absolutely necessary as a "founding PR" - mainly to work out which approach will be good and make sure any next command will follow it. I personally think it is quite important because this will gate contributors from making mistakes that are not at all obvious but they can easily compound into unusable CLI. Once we get out of the gates, it's very difficult to keep it if there is no check in place. If you look at some history, every now and then mainly @ashb performs such ad-hoc clean-ups which will speed-up CLI import by a factor of few times - impacting heavily user performance. An example of this here: https://github.com/apache/airflow/pull/21438/files (there were a few more complex but I found this one first). I think this is a "known" and "expected" problem, and we should rather prevent this problem from even occuring - it is always cheaper to prevent stuff in the first place than deal with it afterwards. The thing here is that just implementing such gate might reveal some problems and influence the way our modules are structured, and how they intract with rest of airlfow. Main problem is that we have not kept it in check from the very beginning and we have no verification - so any new command added might break the "fast import" approach (and well - they did, many times). Problems with those command and using click, this will get far worse than current implementation of the CLI - if we don't gate it from the beginning, because we have less control over it. Click heavily relies on importing all modules in all click-annotated functions and it's not possible (afaik) to change it. And it is super easy to make even simple mistake that leads to huge slow-down for example by adding a TypeHint to a function in the same module that has an click-annotated command). If we don't gate it, we will likely see the slow-downs rapidly growing if we are not careful. And it's not "academic" discussion. We've already been there - if you look at the history https://github.com/apache/airflow/pull/6594 - the original CLI implementation of Airflow had the very same problem and it became unbearable and borderline unusable. With click we can't do this lazy import the same way as @mik-laj did, because click will simply make all the "click" annotated methods load at startup. And even worse - alll of them are actually parsed every time user presses <TAB> for auto-completion. This last point can likely be optimized by pre-computing and caching completion, but AFAIK there is no good implementation of such caching yet ( I thought about writing one but had no time to do it). But the architecture of click currently kind of "forces" you to write optimized command definitions. And we want click interface/autocomplete to be better than the current one - not to regress. Also airflow at places has rather "bad legacy" of dragging literally *half* of the airflow imports once some classes are imported (especially conf). One bad import at top level might add literally seconds to boostrap time. While this is something yet-to-be-solved, this one is rather difficult due to backwards compatibility so we rather make sure that people adding commands will not fall into the same trap. Those are not isolated airflow concerns - those problem even made "pre-commit" maintainer plainly refused accepting click as a cli solution in order to implement auto-complete (https://github.com/pre-commit/pre-commit/issues/1119 - where multiple people - including myself - offered help in optimizing it). And our CLI is far more complex and elaborated than pre-commit one. So I believe, we need to make sure our click-annotaed files commands are implemented - from day 0. Otherwise we risk that some decisions made by the authors and the commands are implemented in the way that makes it difficult to change, so basically the author of the original implementation might not realize that decisions made when implementing it have some bad consequences and chooses an "easy path" instead of - for example - refactor some part of code and remove unnecessary couplings. The bad thing is that it will not be immediately visible. The nature of it is that it will get slower with every new command and we will only realise that we need to optimize when it will need a change similar to Kamil's #6594 - +4000,-3000 lines of code. I do not want to hold it off - so I'd ask what @ashb and @mik-laj think as they were mainly dealing with the CLI problems of Airflow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
