potiuk commented on PR #42572:
URL: https://github.com/apache/airflow/pull/42572#issuecomment-2514182293

   > I'll go back to the list with this feedback, but this implementation is 
tantamount to a parallel scheduler, executor and triggerer implementation so is 
very unlikely to be accepted in to core.
   
   Looking forward to the discussion :).  
   
   I am not strong for / against this kind of operators, but I see a class of 
use cases - which might be very interesting in the near future. I.e. runnig a 
independent "airlfow" tasks on the same machine, using the fact that those 
independent tasks could store the data they are working on in-memory. 
Leveraging things like Apache Arrow to enable 0-data-copy optimizations and the 
fact that many existing tools and libraries already support it. 
   
   And It seems that it very nicely fits into the case where multiple tasks 
might be using different libraries to run complex - and sometimes parallel but 
on the same machine - worfklows using that data loadded in CPU and GPU. This 
has been mentioned multiple times in the past in various forms (but "task 
affinity" is one that seems like best fitting the need there) - and I think 
streamed Operator as defined now is not really implementing it in the way that 
is best, but I would not exclude we will get something there sooner or later.
   
   IMHO, we are at the verge on users looking at very aggressive optimizations 
in this space - and while it all could be done by writing task flow tasks, 
there is a value in being able to see, observe have dependencies and 
parallelise those tasks via Airflow mechanisms. I am not sure if "streamed 
operator" is the right abstraction for it and maybe we could make decision this 
is **not** interesting for the community, I think we should definitely discuss 
it and see if we can do something there beyond Airflow 3.
   
   >>  Airlfow is not a processing tool ( stream or batch )
   
   > I very strongly disagree with this statement.
   
   Me too. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to