I think the original reason is how Python parsing works. At the moment we create the task the variable name is not known. First task is created as an object and then the result of it is assigned to a variable. And I think we have no super-reliable way (unless there is some wild Python trickery) to find out what is the actual variable being assigned to (and my gut feeling is that there might be cases that will make our attempt fail). Theoretically, you could parse the AST of your Python DAG, you could potentially find out what is the variable name that is going to be assigned to - see this SO question: https://stackoverflow.com/questions/18425225/getting-the-name-of-a-variable-as-a-string. But I am afraid finding out the right assignment in AST in a general case (including nested frames, built-ins handling etc.) would be either very error-prone or even impossible in some cases. And for sure it would be much slower, because you would have to access and traverse the AST of Python DAG being executed right now, find a proper assignment and get the name of the variable from there.
I am not sure if it is worth it but maybe someone would like to prototype it and run it on many DAGs to see if this could be a viable option? J. On Wed, Apr 27, 2022 at 5:31 PM Ferruzzi, Dennis <[email protected]> wrote: > > Hi folks, I'm hoping for a little history lesson. I'm idly wondering if > there is a way to make a fairly big change (for me), but want to understand > the reason it is the way it is now, before I go and put much time into > "fixing" it. > > Every time I write a DAG it bugs me that we have to essentially name a task > twice and I'm thinking of proposing/implementing the change. For example: > > > train_model = SageMakerTrainingOperator( > task_id='train_model', > config=TRAINING_CONFIG, > ) > > > I'd love to see the task_id default to the task's variable name. It's > exceedingly rare in my DAGs for those two values not to be identical and it > catches me from time to time forgetting to state the task_id. But maybe > there is a reason this is the way it works, or maybe my personal experiences > are just too limited to see why this is a Bad Idea.
