potiuk commented on pull request #18356:
URL: https://github.com/apache/airflow/pull/18356#issuecomment-922806024


   > * There are a lot of points that aren't incorrect. However, as a reader, I 
think "why?" a lot. For example here: "what kind of filesystem you have to 
share the DAGs". I assume the average reader understands there are different 
filesystems, but doesn't know how to tune Airflow for either filesystem. 
However, the text doesn't explain the consequences of using e.g. NFS vs local 
filesystem, or what to tune in the Airflow settings. Would add some context to 
those statements, or leave them out.
   
   Good point. Added a separate paragraph describing more about resources 
(filesystem/I/O, Database, memory) 
   
   > * Would explain the implications of changing every config option. E.g. 
what happens if you set `scheduler__processor_poll_interval` very low and what 
happens if you set it very high?
   
   I think this is pretty much already explained when you go to details for 
each of those - in the "configuration" - in the `scheduler fine tuning` I just 
wanted to put a general description (and the link to the parameter explains 
more details).
   
   > * Would mention the importance of proper monitoring. Without data you know 
nothing :-) Can we relate tweaking certain config options to certain metrics?
   
   Agree we should stress it (I added this as 'the most important point'). 
However I would avoid explaining specific metrics. First of all  - I have no 
idea and no practice here (and I think most of us don't). We know some general 
resources, their impact and the "knobs".  Each deployment , filesystem, 
monitoring tool etc. has its own specific way of naming/monitoring/alerting so 
I think high-level "what to check?" is much better from our point of view than 
"how to check?" and "what parameters should have which values". 
   
   Also this is a bit dangerous to be very specific. Airflow is a complex 
system and has many parameters to tune and whatever we write in such document, 
it will be taken "literally" and people will rely on it and complain if what we 
describe here "does not work the exact way it is described". I think we should 
be very clear about setting expectations about this document. and be very firm 
and even "assertive" here. We will not give people the "exact" answers they are 
looking for when it comes to performance. They will get the "knobs", 
information on generally what they should pay attention to, and general impact 
of the "knobs".
   
   But we cannot do this FOR the users who maintain Airflow instances. It's 
their job to fine-tune it. Airflow is not a self-tuning system (it could be but 
this would be a huge effort - and one that often takes years to master for 
Managed Service - see for example Kubernetes auto-scaling in GCP). In order to 
do the job - they need to experiment with their deployment. We will never 
answer "What cionfiguration we have to have to achive this and that performance 
with those kind of DAGs we have". People WANT that answer, of course. And it is 
asked many times. But they will never get that answer from us.
   
    We just give them the information what they can  do to get there, but they 
still need to learn, understand the knobs and experiment a bit with them. And 
knowing which knobs to turn and in which directions if you want certain effect 
is the exact purpose of the document.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to