[
https://issues.apache.org/jira/browse/SPARK-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated SPARK-5795:
-----------------------------
Priority: Critical (was: Minor)
Yes, I've seen the same problem and been meaning to do something about it. It
makes you do this to use {{JavaPairDStream}}:
https://github.com/OryxProject/oryx/blob/master/oryx-lambda/src/main/java/com/cloudera/oryx/lambda/BatchLayer.java#L187
So basically, this is how it's declared now:
{code}
def saveAsNewAPIHadoopFiles(
prefix: String,
suffix: String,
keyClass: Class[_],
valueClass: Class[_],
outputFormatClass: Class[_ <: NewOutputFormat[_, _]],
conf: Configuration = new Configuration) {
dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass,
outputFormatClass, conf)
}
{code}
but this works, and is how it works in {{JavaPairRDD}}:
{code}
def saveAsNewAPIHadoopFiles[F <: NewOutputFormat[_, _]](
prefix: String,
suffix: String,
keyClass: Class[_],
valueClass: Class[_],
outputFormatClass: Class[F],
conf: Configuration = new Configuration) {
dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass,
outputFormatClass, conf)
}
{code}
I worry about an API change of course, but, I think the current API isn't
directly callable, so seems OK to change.
For a simple demo, try compiling this:
{code}
JavaPairDStream<IntWritable,Text> pds = null;
pds.saveAsNewAPIHadoopFiles("", "", IntWritable.class, IntWritable.class,
SequenceFileOutputFormat.class);
{code}
The change above makes it work. I'll open a PR. I bumped the priority based on
my understanding of the issue.
> api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
> -------------------------------------------------------------------------
>
> Key: SPARK-5795
> URL: https://issues.apache.org/jira/browse/SPARK-5795
> Project: Spark
> Issue Type: Bug
> Components: Streaming
> Affects Versions: 1.2.1
> Reporter: Littlestar
> Priority: Critical
> Attachments: TestStreamCompile.java
>
>
> import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
> the following code can't compile on java.
> JavaPairDStream<Integer, Integer> rs =....
> rs.saveAsNewAPIHadoopFiles("prefix", "txt", Integer.class, Integer.class,
> TextOutputFormat.class, jobConf);
> but similar code in JavaPairRDD works ok.
> JavaPairRDD<String, String> counts =.......
> counts.saveAsNewAPIHadoopFile("out", Text.class, Text.class,
> TextOutputFormat.class, jobConf);
> ====================
> mybe the
> def saveAsNewAPIHadoopFiles(
> prefix: String,
> suffix: String,
> keyClass: Class[_],
> valueClass: Class[_],
> outputFormatClass: Class[_ <: NewOutputFormat[_, _]],
> conf: Configuration = new Configuration) {
> dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass,
> outputFormatClass, conf)
> }
> =====>
> def saveAsNewAPIHadoopFiles[F <: NewOutputFormat[_, _]](
> prefix: String,
> suffix: String,
> keyClass: Class[_],
> valueClass: Class[_],
> outputFormatClass: Class[F],
> conf: Configuration = new Configuration) {
> dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass,
> outputFormatClass, conf)
> }
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]