Oh I see! Thank you very much, Davies. You correct some of my wrong understandings.
On Thu, Feb 12, 2015 at 9:50 AM, Davies Liu <dav...@databricks.com> wrote: > Yes. > > On Wed, Feb 11, 2015 at 5:44 PM, Todd Gao <todd.gao.2013+sp...@gmail.com> > wrote: > > Thanks Davies. > > I am not quite familiar with Spark Streaming. Do you mean that the > compute > > routine of DStream is only invoked in the driver node, > > while only the compute routines of RDD are distributed to the slaves? > > > > On Thu, Feb 12, 2015 at 2:38 AM, Davies Liu <dav...@databricks.com> > wrote: > >> > >> The CallbackServer is part of Py4j, it's only used in driver, not used > >> in slaves or workers. > >> > >> On Wed, Feb 11, 2015 at 12:32 AM, Todd Gao > >> <todd.gao.2013+sp...@gmail.com> wrote: > >> > Hi all, > >> > > >> > I am reading the code of PySpark and its Streaming module. > >> > > >> > In PySpark Streaming, when the `compute` method of the instance of > >> > PythonTransformedDStream is invoked, a connection to the > CallbackServer > >> > is created internally. > >> > I wonder where is the CallbackServer for each PythonTransformedDStream > >> > instance on the slave nodes in distributed environment. > >> > Is there a CallbackServer running on every slave node? > >> > > >> > thanks > >> > Todd > > > > >