potiuk commented on PR #42572: URL: https://github.com/apache/airflow/pull/42572#issuecomment-2514182293
> I'll go back to the list with this feedback, but this implementation is tantamount to a parallel scheduler, executor and triggerer implementation so is very unlikely to be accepted in to core. Looking forward to the discussion :). I am not strong for / against this kind of operators, but I see a class of use cases - which might be very interesting in the near future. I.e. runnig a independent "airlfow" tasks on the same machine, using the fact that those independent tasks could store the data they are working on in-memory. Leveraging things like Apache Arrow to enable 0-data-copy optimizations and the fact that many existing tools and libraries already support it. And It seems that it very nicely fits into the case where multiple tasks might be using different libraries to run complex - and sometimes parallel but on the same machine - worfklows using that data loadded in CPU and GPU. This has been mentioned multiple times in the past in various forms (but "task affinity" is one that seems like best fitting the need there) - and I think streamed Operator as defined now is not really implementing it in the way that is best, but I would not exclude we will get something there sooner or later. IMHO, we are at the verge on users looking at very aggressive optimizations in this space - and while it all could be done by writing task flow tasks, there is a value in being able to see, observe have dependencies and parallelise those tasks via Airflow mechanisms. I am not sure if "streamed operator" is the right abstraction for it and maybe we could make decision this is **not** interesting for the community, I think we should definitely discuss it and see if we can do something there beyond Airflow 3. >> Airlfow is not a processing tool ( stream or batch ) > I very strongly disagree with this statement. Me too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
