[
https://issues.apache.org/jira/browse/SPARK-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245263#comment-15245263
]
seth commented on SPARK-14688:
------------------------------
I'm not so informed about the jvm, but it seems that according to :
{{return DStream(self._jssc.textFileStream(directory), self,
UTF8Deserializer())}}
that it uses the java gateway:
{{jssc = gw.jvm.JavaStreamingContext(ssc_option.get())}}
So, yes, I think it uses the Hadoop APIs to I/O.
Also, according to:
[http://stackoverflow.com/questions/30043239/apache-spark-streaming-textfilestream-reading-gzip-files]
It seems to work at the main spark API or Java API
> pyspark textFileStream gzipped
> ------------------------------
>
> Key: SPARK-14688
> URL: https://issues.apache.org/jira/browse/SPARK-14688
> Project: Spark
> Issue Type: Improvement
> Components: PySpark, Streaming
> Affects Versions: 1.6.1
> Reporter: seth
> Labels: pyspark, streaming
>
> pyspark streamingObject does not support reading gzip files.
> 2 notes:
> 1.regular sparkContext does support gzip files
> 2. Java/Scala method support streaming gzip files
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]