You can look into its source code:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala


On Thu, Jul 7, 2016 at 11:01 PM, Amit Rana <amitranavs...@gmail.com> wrote:

> Hi all,
>
> Did anyone get a chance to look into it??
> Any sort of guidance will be much appreciated.
>
> Thanks,
> Amit Rana
> On 7 Jul 2016 14:28, "Amit Rana" <amitranavs...@gmail.com> wrote:
>
>> As mentioned in the documentation:
>> PythonRDD objects launch Python subprocesses and communicate with them
>> using pipes, sending the user's code and the data to be processed.
>>
>> I am trying to understand  the implementation of how this data transfer
>> is happening  using pipes.
>> Can anyone please guide me along that line??
>>
>> Thanks,
>> Amit Rana
>> On 7 Jul 2016 13:44, "Sun Rui" <sunrise_...@163.com> wrote:
>>
>>> You can read
>>> https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
>>> For pySpark data flow on worker nodes, you can read the source code of
>>> PythonRDD.scala. Python worker processes communicate with Spark executors
>>> via sockets instead of pipes.
>>>
>>> On Jul 7, 2016, at 15:49, Amit Rana <amitranavs...@gmail.com> wrote:
>>>
>>> Hi all,
>>>
>>> I am trying  to trace the data flow in pyspark. I am using intellij IDEA
>>> in windows 7.
>>> I had submitted  a python  job as follows:
>>> --master local[4] <path to pyspark  job> <arguments to the job>
>>>
>>> I have made the following  insights after running the above command in
>>> debug mode:
>>> ->Locally when a pyspark's interpreter starts, it also starts a JVM with
>>> which it communicates through socket.
>>> ->py4j is used to handle this communication
>>> ->Now this JVM acts as actual spark driver, and loads a JavaSparkContext
>>> which communicates with the spark executors in cluster.
>>>
>>> In cluster I have read that the data flow between spark executors and
>>> python interpreter happens using pipes. But I am not able to trace that
>>> data flow.
>>>
>>> Please correct me if my understanding is wrong. It would be very helpful
>>> if, someone can help me understand tge code-flow for data transfer between
>>> JVM and python workers.
>>>
>>> Thanks,
>>> Amit Rana
>>>
>>>
>>>

Reply via email to