[jira] [Commented] (SPARK-14688) pyspark textFileStream gzipped

seth (JIRA) Mon, 18 Apr 2016 01:00:59 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245263#comment-15245263
 ]


seth commented on SPARK-14688:
------------------------------

I'm not so informed about the jvm, but it seems that according to :
{{return DStream(self._jssc.textFileStream(directory), self, 
UTF8Deserializer())}} 
that it uses the java gateway:
{{jssc = gw.jvm.JavaStreamingContext(ssc_option.get())}}
So, yes, I think it uses the Hadoop APIs to I/O.

Also, according to:
[http://stackoverflow.com/questions/30043239/apache-spark-streaming-textfilestream-reading-gzip-files]
It seems to work at the main spark API or Java API


> pyspark textFileStream gzipped
> ------------------------------
>
>                 Key: SPARK-14688
>                 URL: https://issues.apache.org/jira/browse/SPARK-14688
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, Streaming
>    Affects Versions: 1.6.1
>            Reporter: seth
>              Labels: pyspark, streaming
>
> pyspark streamingObject does not support reading gzip files.
> 2 notes: 
> 1.regular sparkContext does support gzip files
> 2. Java/Scala method support streaming gzip files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-14688) pyspark textFileStream gzipped

Reply via email to