[jira] [Created] (SPARK-5037) support dynamic loading of input DStreams in pyspark streaming

Jascha Swisher (JIRA) Wed, 31 Dec 2014 09:25:37 -0800

Jascha Swisher created SPARK-5037:
-------------------------------------

             Summary: support dynamic loading of input DStreams in pyspark 
streaming
                 Key: SPARK-5037
                 URL: https://issues.apache.org/jira/browse/SPARK-5037
             Project: Spark
          Issue Type: New Feature
          Components: PySpark, Streaming
    Affects Versions: 1.2.0
            Reporter: Jascha Swisher



The scala and java streaming APIs support "external" InputDStreams (e.g. the 
ZeroMQReceiver example) through a number of mechanisms, for instance by 
overriding ActorReceiver or just subclassing Receiver directly. The pyspark 
streaming API does not currently allow similar flexibility, being limited at 
the moment to file-backed text and binary streams or socket text streams.

It would be great to open up the pyspark streaming API to other stream sources, 
putting it closer to on par with the JVM APIs.

One way of doing this could be to support dynamically loading InputDStream 
implementations through reflection at the JVM level, analogously to what is 
currently done for Hadoop InputFormats in the regular pyspark context.py 
*Hadoop* methods. 

I'll submit a PR momentarily with my shot at this. Comments and alternative 
approaches more than welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-5037) support dynamic loading of input DStreams in pyspark streaming

Reply via email to