rafalh commented on PR #28970:
URL: https://github.com/apache/airflow/pull/28970#issuecomment-1408402968

   @Taragolis @eladkal Please review my change. Please note that this change 
makes migration from old job operators like `DataprocSubmitPySparkJobOperator` 
(which are deprecated according to code in `__init__`) to 
`DataprocSubmitJobOperator` easier. `DataprocSubmitPySparkJobOperator` 
generates job name based on task ID (`ti.task_id`). When using 
`DataprocSubmitJobOperator` with `DataProcJobBuilder` it is currently not 
possible without workarounds, e.g. this code won't work:
   ```
   task1 = DataprocSubmitJobOperator(
       task_id="foo",
       job=DataProcJobBuilder(task_id="{{ task.task_id }}", ...).build(),
       ...
   )
   ```
   Of course I can duplicate task ID, but I shouldn't have to considering that 
templates work in `job` field and there was no problem with using templates in 
job name until #23791 was merged.
   My PR deals with this problem by removing job name sanitization and fixing 
the default value provided in old job operators in case a grouped task is used. 
It should fix the problem that #23791 was fixing and brings back the old 
behavior in `DataProcJobBuilder`.
   Theoretically it is not backward compatible because if user uses 
`DataProcJobBuilder` directly and passes a broken name (e.g. one with a dot) in 
`task_id` field of the constructor or by using `set_job_name` it will stop 
working, but:
   1. it was the old behavior and no one complained until groups were 
introduced, so I don't think it will affect users
   2. I think a good API should fail for invalid input, not try to fix it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to