Corey J. Nolet created SPARK-4320:
-------------------------------------
Summary: JavaPairRDD should supply a saveAsNewHadoopDataset which
takes a Job object
Key: SPARK-4320
URL: https://issues.apache.org/jira/browse/SPARK-4320
Project: Spark
Issue Type: Improvement
Components: Input/Output, Spark Core
Reporter: Corey J. Nolet
Fix For: 1.1.1, 1.2.0
I am outputting data to Accumulo using a custom outputformat. I have tried
using saveAsNewHadoopFile() and that works- though passing an empty path is a
bit weird. Being that it isn't really a file I'm store, but rather a dataset,
I'd be inclined to use the saveAsHadoopDataset() method, though I'm not at all
interested in using the legacy mapred API.
Perhaps we could supply a saveAsNewHadoopDateset method. Personally, I think
there should be two ways of calling into this method. Instead of needing to set
up the Job object explicitly, I'm in the camp of having the following method
signature:
saveAsNewHadoopDataset(keyClass : Class[K], valueClass : Class[V], ofclass :
Class[? extends OutputFormat], conf : Configuration). This way, if I'm writing
spark jobs that are going from Hadoop back into Hadoop, I can construct my
Configuration once.
Perhaps an overloaded method signature could be:
saveAsNewHadoopDataset(job : Job)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]