As long as that code is serializable (through pickle, cloudpickle or any other Python code serializaers ), the answer should be yes.
Thanks. -- Ruslan Dautkhanov On Wed, Apr 25, 2018 at 9:54 AM, Taylor Edmiston <[email protected]> wrote: > Is it possible for the (hypothetical) Airflow SparkExecutor to handle > general execution of any operator (i.e., run non-Spark code)? > > *Taylor Edmiston* > Blog <http://blog.tedmiston.com> | Stack Overflow CV > <https://stackoverflow.com/story/taylor> | LinkedIn > <https://www.linkedin.com/in/tedmiston/> | AngelList > <https://angel.co/taylor> > > > On Wed, Apr 25, 2018 at 11:22 AM, Ruslan Dautkhanov <[email protected]> > wrote: > > > I used "Executor" as an Airflow term, not meant spark executor ... > > Like Spark would be one of Executors > > in here > > https://github.com/apache/incubator-airflow/tree/master/ > airflow/executors > > or in here > > https://github.com/apache/incubator-airflow/tree/master/ > > airflow/contrib/executors > > > > Thanks. > > > > > > > > -- > > Ruslan Dautkhanov > > > > On Wed, Apr 25, 2018 at 9:17 AM, Bolke de Bruin <[email protected]> > wrote: > > > > > Im a bit lost on the spark executor to be honest. To my knowledge the > > > spark driver creates spark executors which run spark code. In other > words > > > in can’t arbitrarily run generic code. Or can it? > > > > > > B. > > > > > > Verstuurd vanaf mijn iPad > > > > > > > Op 25 apr. 2018 om 17:11 heeft Ruslan Dautkhanov < > [email protected] > > > > > > het volgende geschreven: > > > > > > > > Now I think if Airflow on PySpark Executor would be an easier target. > > > > Spark runs on YARN, Mesos and now Kubernetes. > > > > So PySpark Executor would give Airflow porting to these schedulers. > > > > It's my understanding we now have only Spark Operator and not > Executor. > > > > > > > > Thanks! > > > > > > > > > > > > > > > > -- > > > > Ruslan Dautkhanov > > > > > > > >> On Tue, Apr 24, 2018 at 3:20 PM, Ace Haidrey <[email protected]> > > > wrote: > > > >> > > > >> Hey I didn’t know this Bolke, I was under the impression of the same > > as > > > >> Ruslan. > > > >> Thanks for the share > > > >> > > > >> Sent from my iPhone > > > >> > > > >>> On Apr 24, 2018, at 2:12 PM, Bolke de Bruin <[email protected]> > > wrote: > > > >>> > > > >>> It actually can nowadays: https://cdn.oreillystatic.com/ > > > >> en/assets/1/event/269/HDFS%20on%20Kubernetes_%20Tech% > > > >> 20deep%20dive%20on%20locality%20and%20security%20Presentation.pptx > > > >>> > > > >>> We also have an on premise setup with ceph (s3a) and HDFS for when > we > > > >> need the speed and kubernetes for our workloads. We are kicking out > > Yarn > > > >> (and hive etc for that matter). > > > >>> > > > >>> Bolke > > > >>> > > > >>> > > > >>> > > > >>> Verstuurd vanaf mijn iPad > > > >>> > > > >>>> Op 24 apr. 2018 om 22:50 heeft Ruslan Dautkhanov < > > > [email protected]> > > > >> het volgende geschreven: > > > >>>> > > > >>>> Kubernetes is a "monolithic" 1-level scheduler that can't handle > > what > > > >> YARN > > > >>>> can - for example schedule tasks local to data. > > > >>>> Hadoop has multiple levels of data locality (node-local, > > rack-local) - > > > >> so > > > >>>> computation happens local to data to minimize network > > > >>>> data transfer which is expensive. > > > >>>> K8s wasn't designed to handle this scheduling scenarios, as far > as I > > > >> know. > > > >>>> > > > >>>> For cloud deployments where we don't have data locality problem > > > >> (because of > > > >>>> s3 is being used instead of storage local > > > >>>> to servers), k8s might be okay. > > > >>>> > > > >>>> Nice comparison [1] of k8s vs two-level schedulers like yarn and > > > messos > > > >> .. > > > >>>> although I think it's an offtopic. > > > >>>> > > > >>>> We're mostly on-prem and we don't see kubernetes take over yarn > any > > > time > > > >>>> soon. > > > >>>> > > > >>>> Thanks. > > > >>>> > > > >>>> > > > >>>> > > > >>>> [1] > > > >>>> > > > >>>> https://aaltodoc.aalto.fi/bitstream/handle/123456789/ > > > >> 27061/master_Ravula_Shashi_2017.pdf?sequence=1 > > > >>>> > > > >>>> *2.3.2 Monolithic Schedulers * > > > >>>> > > > >>>> > > > >>>> > > > >>>> Monolithic schedulers use a single, centralized scheduling > algorithm > > > for > > > >>>> all jobs. All workload is run through the same scheduler and same > > > >>>> scheduling logic. Swarm, > > > >>>> Fleet, Borg and Kubernetes adopt monolithic schedulers. Kubernetes > > > >>>> improvised on basic monolithic version of Borg and Swarm > schedulers. > > > >> This > > > >>>> type of schedulers are not suitable for running heterogeneous > modern > > > >>>> workloads which include Spark jobs, containers, and other long > > running > > > >> jobs, > > > >>>> etc. > > > >>>> > > > >>>> > > > >>>> > > > >>>> *2.3.3 Two Level Schedulers * > > > >>>> > > > >>>> > > > >>>> > > > >>>> Two-level schedulers address the drawbacks of a monolithic > scheduler > > > by > > > >>>> separating concerns of resource allocation and task placement. An > > > active > > > >>>> resource manager offers compute resources to multiple parallel, > > > >> independent > > > >>>> “scheduler frameworks”. The Mesos cluster manager pioneered this > > > >> approach, > > > >>>> and YARN supports a limited version of it. In Mesos, resources are > > > >> offered > > > >>>> to application-level schedulers. This allows for custom, > > > >> workload-specific > > > >>>> scheduling policies. The drawback with this type of scheduling > > > >> architecture > > > >>>> is that the application level frameworks cannot see all the > possible > > > >>>> placement options anymore. Instead, they only see those options > that > > > >>>> correspond to resources offered (Mesos) or allocated (YARN) by the > > > >> resource > > > >>>> manager component. This makes priority preemption (higher priority > > > tasks > > > >>>> kick out lower priority ones) difficult. > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>> -- > > > >>>> Ruslan Dautkhanov > > > >>>> > > > >>>>> On Tue, Apr 24, 2018 at 2:22 PM, Bolke de Bruin < > [email protected] > > > > > > >> wrote: > > > >>>>> > > > >>>>> Happy to have it as a contrib executor. However, I personally > think > > > >> yarn > > > >>>>> is a dead end. It has a lot of catching up to do and all the > > momentum > > > >> is > > > >>>>> with kubernetes. > > > >>>>> > > > >>>>> B. > > > >>>>> > > > >>>>> Verstuurd vanaf mijn iPad > > > >>>>> > > > >>>>>> Op 24 apr. 2018 om 22:13 heeft Ruslan Dautkhanov < > > > >> [email protected]> > > > >>>>> het volgende geschreven: > > > >>>>>> > > > >>>>>> With Hadoop 3's Docker on YARN support, I think YARN becomes > > > >>>>>> somewhat a competitor for Kubernetes. > > > >>>>>> > > > >>>>>> Great job on adding k8s support to Airflow. > > > >>>>>> > > > >>>>>> Very similarly I see Airflow could integrate with YARN and use > > > >>>>>> its infrastructure as an "executor" .. have anyone explored > > > >> feasibility > > > >>>>> of > > > >>>>>> this approach? > > > >>>>>> > > > >>>>>> > > > >>>>>> Thanks! > > > >>>>>> Ruslan Dautkhanov > > > >>>>> > > > >> > > > > > >
