derkuci commented on pull request #19157:
URL: https://github.com/apache/airflow/pull/19157#issuecomment-1001737855


   I have the same problem when upgrading from 1.9 to 2.2.  I couldn't figure 
out a fix and have to give up all the use of "run_as_user" (had to 
rewrite/rearrange the tasks).  That's a shame.
   
   I tried to go through the airflow code, but with very limited knowledge 
about its architect assumptions, I didn't progress much.  All I can guess is 
that there's inconsistency between how a task identify itself and how it 
communicates (heartbeat) with the scheduler/db/whatever.  Your proposed code 
change seems a good start, but don't fully resolve the inconsistency.
   
   For example, with Celery, the process hierarchy with run_as_user is
   ```
      celery worker process
        \-- (forked) task process
              \-- sudo process
   ```
   I've seen in `LocalTaskJob.heartbeat_callback()`, `ti.pid=<task process 
pid>` and `current_pid=<sudo process pid>`.  The comparison is actually between 
`ti.pid's ppid` i.e. `<celery worker process pid>` versus `<sudo process pid>` 
which I couldn't understand at all.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to