[jira] [Updated] (SPARK-6513) Regression - Adding zipWithUniqueId (and other missing RDD APIs) to RDDApi.scala

Eran Medan (JIRA) Tue, 24 Mar 2015 14:57:09 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eran Medan updated SPARK-6513:
------------------------------
    Description: 
I'm sure this has an Issue somewhere but I can't find it. 

I see this as a regression bug, since it compiled in 1.2.1 but stopped in 1.3 
without any earlier deprecation warnings, but I am sure the authors are well 
aware, so please change it to an enhancement request if you disagree this is a 
regression. It's such an obvious and blunt regression that I doubt it was done 
without a lot of thought and I'm sure there was a good reason, but still it 
breaks my code and I don't have a workaround :)

Here are the details / steps to reproduce

*Worked in 1.2.1* (without any deprecation warnings)
{code}
     val sqlContext = new HiveContext(sc)
     import sqlContext._
     val jsonRDD = sqlContext.jsonFile(jsonFilePath)
     jsonRDD.registerTempTable("jsonTable")

     val jsonResult = sql(s"select * from jsonTable")
     val foo = jsonResult.zipWithUniqueId().map {
       case (Row(...), uniqueId) => // do something useful
       ...
     }

     foo.registerTempTable("...")

{code}

*Stopped working in 1.3.0* (simply does not compile, and all I did was change 
to 1.3)
{code}   
    jsonResult.zipWithUniqueId() //since RDDApi doesn't implement that method
{code}

**Not working workaround:**

although this might give me an {{RDD\[Row\]}}:
{code}
    jsonResult.map(identity).zipWithUniqueId()  
{code}

Now this won't work obviously since {{RDD\[Row\]}} does not have a 
{{registerTempTable}} method of course
{code}
         foo.registerTempTable("...")
{code}

(see related SO question: 
http://stackoverflow.com/questions/29243186/is-this-a-regression-bug-in-spark-1-3)



  was:
I'm sure this has an Issue somewhere but I can't find it. 

I see this as a regression bug, since it compiled in 1.2.1 but stopped in 1.3 
without any earlier deprecation warnings, but I am sure the authors are well 
aware, so please change it to an enhancement request if you disagree this is a 
regression. It's such an obvious and blunt regression that I doubt it was done 
without a lot of thought and I'm sure there was a good reason, but still it 
breaks my code and I don't have a workaround :)

Here are the details / steps to reproduce

**Worked in 1.2.1** (without any deprecation warnings)

     val sqlContext = new HiveContext(sc)
     import sqlContext._
     val jsonRDD = sqlContext.jsonFile(jsonFilePath)
     jsonRDD.registerTempTable("jsonTable")

     val jsonResult = sql(s"select * from jsonTable")
     val foo = jsonResult.zipWithUniqueId().map {
       case (Row(...), uniqueId) => // do something useful
       ...
     }

     foo.registerTempTable("...")

**Stopped working in 1.3.0** (simply does not compile, and all I did was change 
to 1.3)
   
    jsonResult.zipWithUniqueId() //since RDDApi doesn't implement that method


**Not working workaround:**

although this might give me an RDD[Row]:

    jsonResult.map(identity).zipWithUniqueId()  

now this won't work as `RDD[Row]` does not have a `registerTempTable` method of 
course

         foo.registerTempTable("...")


(see related SO question: 
http://stackoverflow.com/questions/29243186/is-this-a-regression-bug-in-spark-1-3)




> Regression - Adding zipWithUniqueId (and other missing RDD APIs) to 
> RDDApi.scala
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-6513
>                 URL: https://issues.apache.org/jira/browse/SPARK-6513
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.0
>         Environment: Windows 7 64bit, Scala 2.11.6, JDK 1.7.0_21 (though I 
> don't think it's relevant)
>            Reporter: Eran Medan
>            Priority: Blocker
>
> I'm sure this has an Issue somewhere but I can't find it. 
> I see this as a regression bug, since it compiled in 1.2.1 but stopped in 1.3 
> without any earlier deprecation warnings, but I am sure the authors are well 
> aware, so please change it to an enhancement request if you disagree this is 
> a regression. It's such an obvious and blunt regression that I doubt it was 
> done without a lot of thought and I'm sure there was a good reason, but still 
> it breaks my code and I don't have a workaround :)
> Here are the details / steps to reproduce
> *Worked in 1.2.1* (without any deprecation warnings)
> {code}
>      val sqlContext = new HiveContext(sc)
>      import sqlContext._
>      val jsonRDD = sqlContext.jsonFile(jsonFilePath)
>      jsonRDD.registerTempTable("jsonTable")
>      val jsonResult = sql(s"select * from jsonTable")
>      val foo = jsonResult.zipWithUniqueId().map {
>        case (Row(...), uniqueId) => // do something useful
>        ...
>      }
>      foo.registerTempTable("...")
> {code}
> *Stopped working in 1.3.0* (simply does not compile, and all I did was change 
> to 1.3)
> {code}   
>     jsonResult.zipWithUniqueId() //since RDDApi doesn't implement that method
> {code}
> **Not working workaround:**
> although this might give me an {{RDD\[Row\]}}:
> {code}
>     jsonResult.map(identity).zipWithUniqueId()  
> {code}
> Now this won't work obviously since {{RDD\[Row\]}} does not have a 
> {{registerTempTable}} method of course
> {code}
>          foo.registerTempTable("...")
> {code}
> (see related SO question: 
> http://stackoverflow.com/questions/29243186/is-this-a-regression-bug-in-spark-1-3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-6513) Regression - Adding zipWithUniqueId (and other missing RDD APIs) to RDDApi.scala

Reply via email to