[
https://issues.apache.org/jira/browse/SPARK-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Corey J. Nolet closed SPARK-4320.
---------------------------------
Resolution: Won't Fix
Target Version/s: 1.2.1, 1.1.2 (was: 1.1.2, 1.2.1)
> JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object
> ----------------------------------------------------------------------------
>
> Key: SPARK-4320
> URL: https://issues.apache.org/jira/browse/SPARK-4320
> Project: Spark
> Issue Type: Improvement
> Components: Input/Output, Spark Core
> Reporter: Corey J. Nolet
>
> I am outputting data to Accumulo using a custom OutputFormat. I have tried
> using saveAsNewHadoopFile() and that works- though passing an empty path is a
> bit weird. Being that it isn't really a file I'm storing, but rather a
> generic Pair dataset, I'd be inclined to use the saveAsHadoopDataset()
> method, though I'm not at all interested in using the legacy mapred API.
> Perhaps we could supply a saveAsNewHadoopDateset method. Personally, I think
> there should be two ways of calling into this method. Instead of forcing the
> user to always set up the Job object explicitly, I'm in the camp of having
> the following method signature:
> saveAsNewHadoopDataset(keyClass : Class[K], valueClass : Class[V], ofclass :
> Class[? extends OutputFormat], conf : Configuration). This way, if I'm
> writing spark jobs that are going from Hadoop back into Hadoop, I can
> construct my Configuration once.
> Perhaps an overloaded method signature could be:
> saveAsNewHadoopDataset(job : Job)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]