Re: task_id declarations

Jarek Potiuk Wed, 27 Apr 2022 10:10:27 -0700

I think the original reason is how Python parsing works. At the moment
we create the task the variable name is not known. First task is
created as an object and then the result of it is assigned to a
variable. And I think we have no super-reliable way (unless there is
some wild Python trickery) to find out what is the actual variable
being assigned to (and my gut feeling is that there might be cases
that will make our attempt fail). Theoretically, you could parse the
AST of your Python DAG, you could potentially find out what is the
variable name that is going to be assigned to - see this SO question:
https://stackoverflow.com/questions/18425225/getting-the-name-of-a-variable-as-a-string.
But I am afraid finding out the right assignment in AST in a general
case (including nested frames, built-ins handling etc.) would be
either very error-prone or even impossible in some cases. And for sure
it would be much slower, because you would have to access and traverse
the AST of Python DAG being executed right now, find a proper
assignment and get the name of the variable from there.


I am not sure if it is worth it but maybe someone would like to
prototype it and run it on many DAGs to see if this could be a viable
option?
J.

On Wed, Apr 27, 2022 at 5:31 PM Ferruzzi, Dennis
<[email protected]> wrote:
>
> Hi folks, I'm hoping for a little history lesson.  I'm idly wondering if 
> there is a way to make a fairly big change (for me), but want to understand 
> the reason it is the way it is now, before I go and put much time into 
> "fixing" it.
>
> Every time I write a DAG it bugs me that we have to essentially name a task 
> twice and I'm thinking of proposing/implementing the change.  For example:
>
>
>     train_model = SageMakerTrainingOperator(
>         task_id='train_model',
>         config=TRAINING_CONFIG,
>     )
>
>
> I'd love to see the task_id default to the task's variable name.  It's 
> exceedingly rare in my DAGs for those two values not to be identical and it 
> catches me from time to time forgetting to state the task_id.   But maybe 
> there is a reason this is the way it works, or maybe my personal experiences 
> are just too limited to see why this is a Bad Idea.

Re: task_id declarations

Reply via email to