Re: CallbackServer in PySpark Streaming

Todd Gao Wed, 11 Feb 2015 17:46:33 -0800

Thanks Davies.
I am not quite familiar with Spark Streaming. Do you mean that the compute
routine of DStream is only invoked in the driver node,
while only the compute routines of RDD are distributed to the slaves?


On Thu, Feb 12, 2015 at 2:38 AM, Davies Liu <[email protected]> wrote:

> The CallbackServer is part of Py4j, it's only used in driver, not used
> in slaves or workers.
>
> On Wed, Feb 11, 2015 at 12:32 AM, Todd Gao
> <[email protected]> wrote:
> > Hi all,
> >
> > I am reading the code of PySpark and its Streaming module.
> >
> > In PySpark Streaming, when the `compute` method of the instance of
> > PythonTransformedDStream is invoked, a connection to the CallbackServer
> > is created internally.
> > I wonder where is the CallbackServer for each PythonTransformedDStream
> > instance on the slave nodes in distributed environment.
> > Is there a CallbackServer running on every slave node?
> >
> > thanks
> > Todd
>

Re: CallbackServer in PySpark Streaming

Reply via email to