We use multiprocessing queues in both the Celery and Kube executors too :D -a
> On 29 Apr 2019, at 22:51, Driesprong, Fokko <[email protected]> wrote: > > Thanks for the explanation. I'd go for the manager as well. In the end, if > you really want to scale out, you should go for Celery/Kubernetes. > > Cheers, Fokko > > Op ma 29 apr. 2019 om 23:09 schreef Jarek Potiuk <[email protected]>: > >> Yep. Manager() creates a SyncManager instance and runs start(). >> This in turn creates a Manager process that all processes communicate with >> via Proxies: >> >> https://docs.python.org/3.4/library/multiprocessing.html?highlight=process#sharing-state-between-processes >> >> It's reliable, but slower than Shared Memory used by direct Queue >> instantiation. >> >> J. >> >> On Mon, Apr 29, 2019 at 10:26 PM Bas Harenslak < >> [email protected]> wrote: >> >>> @Fokko I believe that’s implicitly created by multiprocessing.Manager()< >>> >> https://docs.python.org/3.4/library/multiprocessing.html?highlight=process#multiprocessing.sharedctypes.multiprocessing.Manager >>>> : >>> >>> ….. The returned manager object corresponds to a spawned child process >> and >>> …. >>> >>> I’m in favour of the Manager route, since this doesn’t introduce >>> additional complex multiprocessing code in Airflow. I’m reviewing right >> now >>> although it’s a bit out of my comfort zone... >>> >>> Bas >>> >>> On 29 Apr 2019, at 21:46, Driesprong, Fokko <[email protected] >> <mailto: >>> [email protected]>> wrote: >>> >>> I'm missing the part of another process? This is within the Scheduler >>> process if I understand correctly. >>> >>> Cheers, Fokko >>> >>> Op ma 29 apr. 2019 om 21:33 schreef Jarek Potiuk < >> [email protected] >>> <mailto:[email protected]>>: >>> >>> I am also leaning towards the manager. I updated the >>> https://github.com/apache/airflow/pull/5200 PR now after review and once >>> it >>> passes CI I think we can merge it. >>> If anyone wants to have a look as well, happy to hear it. >>> >>> J. >>> >>> On Mon, Apr 29, 2019 at 2:14 PM Ash Berlin-Taylor <[email protected] >> <mailto: >>> [email protected]>> wrote: >>> >>> I think I lean towards the built-in/manager approach as it is less >>> concurrency code we have to manage/maintain in Airflow, though I'm not >>> hugely happy about another process :( >>> >>> -ash >>> >>> On 29 Apr 2019, at 07:33, Jarek Potiuk <[email protected]<mailto: >>> [email protected]>> >>> wrote: >>> >>> Hello Everyone, >>> >>> I think we need some more pairs of eyes to take a look at potential >>> fixes >>> we have for the pesky LocalExecutorTest that we are all experiencing >>> with >>> our Travis builds. Once we solve it I think we should be much closer to >>> have stable builds - including some other flaky test fixes merged >>> recently. >>> >>> It turned out that the problem relates to quite deep internals of how >>> data >>> is passed between processes using multiprocessing queues. It's really >>> deep >>> in the core processing of Airflow so I think it would be great if also >>> other experienced Airflowers review and comment it and help to select >>> the >>> best solution as we could have missed something. >>> >>> I was looking at it together with Ash and Bas and I (a bit too fast) >>> merged >>> a preliminary version of the fix last week. We reverted it later as it >>> turned out to have some side effects, so we know we have to be careful >>> with >>> this one. >>> >>> After more detailed analysis and discussions with Omar, we have now two >>> potential candidates to fix it. Both are green and from local testing - >>> both are solving the problem in a different way. >>> >>> - https://github.com/apache/airflow/pull/5199 >>> - https://github.com/apache/airflow/pull/5200 >>> >>> I tried to describe the problem, solution candidates with Pros and Cons >>> in >>> the JIRA ticket : >>> https://issues.apache.org/jira/browse/AIRFLOW-4401 >>> >>> I'd love if we can get reviews in the PRs and input to discussion on >>> which >>> solution to choose. >>> >>> J. >>> >>> >>> -- >>> >>> Jarek Potiuk >>> Polidea <https://www.polidea.com/> | Principal Software Engineer >>> >>> M: +48 660 796 129 <+48660796129> >>> E: [email protected]<mailto:[email protected]> >>> >>> >>> >>> -- >>> >>> Jarek Potiuk >>> Polidea <https://www.polidea.com/> | Principal Software Engineer >>> >>> M: +48 660 796 129 <+48660796129> >>> E: [email protected]<mailto:[email protected]> >>> >>> >>> >> >> -- >> >> Jarek Potiuk >> Polidea <https://www.polidea.com/> | Principal Software Engineer >> >> M: +48 660 796 129 <+48660796129> >> E: [email protected] >>
