[GitHub] [airflow] potiuk commented on a diff in pull request #24591: Rewrite the 'version' command to use Click

GitBox Wed, 29 Jun 2022 04:56:58 -0700


potiuk commented on code in PR #24591:
URL: https://github.com/apache/airflow/pull/24591#discussion_r909539094



##########
airflow/cli/__main__.py:
##########
@@ -18,6 +18,7 @@
 # under the License.
 
 from airflow.cli import airflow_cmd
+from airflow.cli.commands import version  # noqa: F401

Review Comment:
   I think console "accessibility" can be quite easily deferred - it's easy to 
change later (however it would be great that people who take care about 
implementing other parts will have an understanding what is our approach of 
user communication is and how they should write their CLI's - but this does not 
impact design and implementation approach as much, so it can be done as "next" 
PR (but it should be done before others get engaged- to make sure we keep 
consistent approach).  
   
   But I'd say pre-commit or some other (not necessary pre-commit) gate for not 
doing unnecessary imports and making sure our CLI commands are optimized for 
the way how click works are absolutely necessary as a "founding PR" - mainly to 
work out which approach will be good and make sure any next command will follow 
it.
   
   I personally think it is quite important because this will gate contributors 
from making mistakes that are not at all obvious but they can easily compound 
into unusable CLI. Once we get out of the gates, it's very difficult to keep it 
if there is no check in place. If you look at some history, every now and then 
mainly @ashb performs such ad-hoc clean-ups which will speed-up CLI import by a 
factor of few times - impacting heavily user performance. 
   An example of this here: https://github.com/apache/airflow/pull/21438/files 
(there were a few more complex but I found this one first). I think this is a 
"known" and "expected" problem, and we should rather prevent this problem from 
even occuring - it is always cheaper to prevent stuff in the first place than 
deal with it afterwards.
   
   The thing here is that just implementing such gate might reveal some 
problems and influence the way our modules are structured, and how they intract 
with rest of airlfow. 
   
   Main problem is that we have not kept it in check from the very beginning 
and we have no verification - so any new command added might break the "fast 
import" approach (and well - they did, many times). 
   
   Problems with those command and using click, this will get far worse than 
current implementation of the CLI - if we don't gate it from the beginning, 
because we have less control over it. Click heavily relies on importing all 
modules in  all click-annotated functions and it's not possible (afaik) to 
change it. And it is super easy to make even simple mistake that leads to huge 
slow-down for example by adding a TypeHint to a function in the same module 
that has an click-annotated command).  If we don't gate it, we will likely see 
the slow-downs rapidly growing if we are not careful. 
   
   And it's not "academic" discussion. We've already been there - if you look 
at the history https://github.com/apache/airflow/pull/6594 - the original CLI 
implementation of Airflow had the very same problem  and it became unbearable 
and borderline unusable. With click we can't do this lazy import the same way 
as @mik-laj did, because click will simply make all the "click" annotated 
methods load at startup. And even worse - alll of them are actually parsed 
every time user presses <TAB> for auto-completion. This last point can likely 
be optimized by pre-computing and caching completion, but AFAIK there is no 
good implementation of such caching yet ( I thought about writing one but had 
no time to do it). But the architecture of click currently kind of "forces" you 
to write optimized command definitions.
   
   And we want click interface/autocomplete to be better than the current one - 
not to regress.
   
   Also airflow at places has rather "bad legacy" of dragging literally *half* 
of the airflow imports once some classes are imported (especially conf). One 
bad import at top level might add literally seconds to boostrap time. While 
this is something yet-to-be-solved, this one is rather difficult due to 
backwards compatibility so we rather make sure that people adding commands will 
not fall into the same trap.
   
   Those are not isolated airflow concerns - those problem even made 
"pre-commit" maintainer plainly refused accepting click as a cli solution in 
order to implement auto-complete 
(https://github.com/pre-commit/pre-commit/issues/1119 - where multiple people - 
including myself - offered help in optimizing it).  And our CLI is far more 
complex and elaborated than pre-commit one.
   
   So I believe, we need to make sure our click-annotaed files commands are 
implemented - from day 0. Otherwise we risk that some decisions made by the 
authors and the commands are implemented in the way that makes it difficult to 
change, so basically the author of the original implementation might not 
realize that decisions made when implementing it have some bad consequences and 
chooses an "easy path" instead of - for example - refactor some part of code 
and remove unnecessary couplings. The bad thing is that it will not be 
immediately visible. The nature of it is that it will get slower with every new 
command and we will only realise that we need to optimize when it will need a 
change similar to Kamil's #6594 - +4000,-3000 lines of code.
   
   I do not want to hold it off - so I'd ask what @ashb and @mik-laj  think as 
they were mainly dealing with the CLI problems of Airflow.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk commented on a diff in pull request #24591: Rewrite the 'version' command to use Click

Reply via email to