EliMor commented on issue #17490:
URL: https://github.com/apache/airflow/issues/17490#issuecomment-894750630


   Hi there! 
   
   Thanks for your feedback. I admit I'd need to take a little bit more time to 
look at 'pod_template_file.' My memory is foggy but these trees look familiar 
to me. 
   
   To clarify there're a few things I wanted to ensure our use of yaml + Jinja 
would accomplish for free with the KJO. 
   
   1. We could pass in the location of the yaml template files just as we could 
for other templates  (template_searchpath)
   2. We could move away entirely from using Python objects for kube related 
things, I do not want to import k8s ever in a Dag. (as you noted!)
   3. We could pass in variables to the KJO to be rendered by Jinja in the yaml 
template
   4. Also with Jinja magic we could reuse **_multiple_** yaml templates to 
render a **_single_** Job (Pod) yaml file similar to how one would do for web 
work. 
   
   If 'pod_template_file' accomplishes this I'm a happy camper, albeit very 
confused.
   
   As far as why a Job and not a Pod, to my (limited) knowledge of kube, the 
extra abstraction of the 'Job' type also allows for parallelism out-of-the-box 
(See [Kube Job](https://kubernetes.io/docs/concepts/workloads/controllers/job/) 
). If I have a use case where I want 10X pods to run simultaneously would I 
need to surface that to the task level on an airflow Dag? Does that not consume 
more resources on the Airflow level than just letting Airflow manage a single 
Job abstraction as a single task and defer to kube to handle the pods? 
   
   Totally understand not wanting to add confusion. I'm confused more than half 
the time I try anything these days! 
   Homework assignment for me, reinvestigate the limitations of 
pod_template_file. 
   
   For one, I do recall also experiencing some bugs with how the logs were 
being forwarded from the pod to Airflow using the KPO. If the pod just slept 
for a minute or something and then completed, whatever it was that was tracking 
the pod for log streaming seemed to just drop and the task would never complete 
and then continue. 
   Entirely different issue, possibly resolved now! 
   
   If there's anywhere else I could offer some clarity or otherwise be helpful 
please let me know! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to