Re: Discussion on SPARK-1139

Nan Zhu Thu, 27 Feb 2014 12:21:20 -0800

any discussion on this?  

I would like to hear more advices from the community before I create the PR,


an example is how to create a NewHadoopRDD


we get a configuration from JobContext

val updatedConf = job.getConfiguration
new NewHadoopRDD(this, fClass, kClass, vClass, updatedConf)


then we create a jobContext based on this configuration object

NewHadoopRDD.scala (L74)
val jobContext = newJobContext(conf, jobId)
val rawSplits = inputFormat.getSplits(jobContext).toArray


because inputFormat is from mapreduce package, it only accept a JobContext as 
the parameter in its methods


I think we should avoid introduce Configuration as the parameter, but same 
thing as before, it will change the APIs


Best,  

--  
Nan Zhu


On Wednesday, February 26, 2014 at 8:23 AM, Nan Zhu wrote:

> Hi, all  
>  
> I just created a JIRA https://spark-project.atlassian.net/browse/SPARK-1139 . 
> The issue discusses that:
>  
> the new Hadoop API based Spark APIs are actually a mixture of old and new 
> Hadoop API.
>  
> Spark APIs are still using JobConf (or Configuration) as one of the 
> parameters, but actually Configuration has been replaced by mapreduce.Job in 
> the new Hadoop API
>  
> for example : 
> http://codesfusion.blogspot.ca/2013/10/hadoop-wordcount-with-new-map-reduce-api.html
>   
>  
> &  
>  
> http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api (p10)
>  
> Personally I think it’s better to fix this design, but it will introduce some 
> compatibility issue  
>  
> Just bring it here for your advices
>  
> Best,  
>  
> --  
> Nan Zhu
>

Re: Discussion on SPARK-1139

Reply via email to