[jira] [Commented] (SPARK-4320) JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object

Corey J. Nolet (JIRA) Wed, 12 Nov 2014 06:43:12 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208084#comment-14208084
 ]


Corey J. Nolet commented on SPARK-4320:
---------------------------------------

Since this is a simple change, I wanted to work on this myself to get more 
familiar with the code base. Could someone w/ the proper privileges give me 
access to be able to assign this ticket to myself?

> JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object 
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-4320
>                 URL: https://issues.apache.org/jira/browse/SPARK-4320
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output, Spark Core
>            Reporter: Corey J. Nolet
>             Fix For: 1.1.1, 1.2.0
>
>
> I am outputting data to Accumulo using a custom OutputFormat. I have tried 
> using saveAsNewHadoopFile() and that works- though passing an empty path is a 
> bit weird. Being that it isn't really a file I'm storing, but rather a  
> generic Pair dataset, I'd be inclined to use the saveAsHadoopDataset() 
> method, though I'm not at all interested in using the legacy mapred API.
> Perhaps we could supply a saveAsNewHadoopDateset method. Personally, I think 
> there should be two ways of calling into this method. Instead of forcing the 
> user to always set up the Job object explicitly, I'm in the camp of having 
> the following method signature:
> saveAsNewHadoopDataset(keyClass : Class[K], valueClass : Class[V], ofclass : 
> Class[? extends OutputFormat], conf : Configuration). This way, if I'm 
> writing spark jobs that are going from Hadoop back into Hadoop, I can 
> construct my Configuration once.
> Perhaps an overloaded method signature could be:
> saveAsNewHadoopDataset(job : Job)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4320) JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object

Reply via email to