[ 
https://issues.apache.org/jira/browse/SPARK-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corey J. Nolet updated SPARK-4320:
----------------------------------
    Description: 
I am outputting data to Accumulo using a custom OutputFormat. I have tried 
using saveAsNewHadoopFile() and that works- though passing an empty path is a 
bit weird. Being that it isn't really a file I'm storing, but rather a  generic 
Pair dataset, I'd be inclined to use the saveAsHadoopDataset() method, though 
I'm not at all interested in using the legacy mapred API.

Perhaps we could supply a saveAsNewHadoopDateset method. Personally, I think 
there should be two ways of calling into this method. Instead of forcing the 
user to always set up the Job object explicitly, I'm in the camp of having the 
following method signature:

saveAsNewHadoopDataset(keyClass : Class[K], valueClass : Class[V], ofclass : 
Class[? extends OutputFormat], conf : Configuration). This way, if I'm writing 
spark jobs that are going from Hadoop back into Hadoop, I can construct my 
Configuration once.

Perhaps an overloaded method signature could be:

saveAsNewHadoopDataset(job : Job)


  was:
I am outputting data to Accumulo using a custom outputformat. I have tried 
using saveAsNewHadoopFile() and that works- though passing an empty path is a 
bit weird. Being that it isn't really a file I'm store, but rather a dataset, 
I'd be inclined to use the saveAsHadoopDataset() method, though I'm not at all 
interested in using the legacy mapred API.

Perhaps we could supply a saveAsNewHadoopDateset method. Personally, I think 
there should be two ways of calling into this method. Instead of needing to set 
up the Job object explicitly, I'm in the camp of having the following method 
signature:

saveAsNewHadoopDataset(keyClass : Class[K], valueClass : Class[V], ofclass : 
Class[? extends OutputFormat], conf : Configuration). This way, if I'm writing 
spark jobs that are going from Hadoop back into Hadoop, I can construct my 
Configuration once.

Perhaps an overloaded method signature could be:

saveAsNewHadoopDataset(job : Job)



> JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object 
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-4320
>                 URL: https://issues.apache.org/jira/browse/SPARK-4320
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output, Spark Core
>            Reporter: Corey J. Nolet
>             Fix For: 1.1.1, 1.2.0
>
>
> I am outputting data to Accumulo using a custom OutputFormat. I have tried 
> using saveAsNewHadoopFile() and that works- though passing an empty path is a 
> bit weird. Being that it isn't really a file I'm storing, but rather a  
> generic Pair dataset, I'd be inclined to use the saveAsHadoopDataset() 
> method, though I'm not at all interested in using the legacy mapred API.
> Perhaps we could supply a saveAsNewHadoopDateset method. Personally, I think 
> there should be two ways of calling into this method. Instead of forcing the 
> user to always set up the Job object explicitly, I'm in the camp of having 
> the following method signature:
> saveAsNewHadoopDataset(keyClass : Class[K], valueClass : Class[V], ofclass : 
> Class[? extends OutputFormat], conf : Configuration). This way, if I'm 
> writing spark jobs that are going from Hadoop back into Hadoop, I can 
> construct my Configuration once.
> Perhaps an overloaded method signature could be:
> saveAsNewHadoopDataset(job : Job)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to