[
https://issues.apache.org/jira/browse/SPARK-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488315#comment-14488315
]
Davies Liu commented on SPARK-6803:
-----------------------------------
After a quick look over the prototype, the callback server is sit in another
process than the driver, because R does not support multiple threading. This
approach will have some limitation, for example, access some shared variables
in callback functions.
Also, we should have a way to collect the logging from callback server, it's
needed when you run the streaming job as a daemon process, with
dstream.pprint().
This prototype is pretty cool, it shows that it's doable to have a Streaming
API in R, even with some limitations.
But the question is that how many user want to do streaming job in R? There
will be a lots of effort to make it production ready. Even with Python API,
there's lots of work to do, for example, support checkpointing and recovery
with HDFS.
> [SparkR] Support SparkR Streaming
> ---------------------------------
>
> Key: SPARK-6803
> URL: https://issues.apache.org/jira/browse/SPARK-6803
> Project: Spark
> Issue Type: New Feature
> Components: SparkR, Streaming
> Reporter: Hao
> Fix For: 1.4.0
>
>
> Adds R API for Spark Streaming.
> A experimental version is presented in repo [1]. which follows the PySpark
> streaming design. Also, this PR can be further broken down into sub task
> issues.
> [1] https://github.com/hlin09/spark/tree/SparkR-streaming/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]