[ 
https://issues.apache.org/jira/browse/SPARK-25737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-25737:
------------------------------
    Environment:     (was: In ancient times in 2013, JavaSparkContext got a 
superclass JavaSparkContextVarargsWorkaround to deal with some Scala 2.7 issue: 
[http://www.scala-archive.org/Workaround-for-implementing-java-varargs-in-2-7-2-final-td1944767.html#a1944772]

I believe this was really resolved by the {{@varags}} annotation in Scala 2.9. 

I believe we can now remove this workaround. Along the way, I think we can also 
avoid the duplicated definitions of {{union()}}. Where we should be able to 
just have one varargs method, we have up to 3 forms:

- {{union(RDD, Seq/List)}}

- {{union(RDD*)}}

- {{union(RDD, RDD*)}}

While this pattern is sometimes used to avoid type collision due to erasure, I 
don't think it applies here.

After cleaning it, we'll have 1 SparkContext and 3 JavaSparkContext methods 
(for the 3 Java RDD types), not 11 methods.

The only difference for callers in Spark 3 would be that {{sc.union(Seq(rdd1, 
rdd2))}} now has to be {{sc.union(rdd1, rdd2)}} (simpler) or 
{{sc.union(Seq(rdd1, rdd2): _*)}})
    Description: 
In ancient times in 2013, JavaSparkContext got a superclass 
JavaSparkContextVarargsWorkaround to deal with some Scala 2.7 issue: 
[http://www.scala-archive.org/Workaround-for-implementing-java-varargs-in-2-7-2-final-td1944767.html#a1944772]

I believe this was really resolved by the {{@varags}} annotation in Scala 2.9. 

I believe we can now remove this workaround. Along the way, I think we can also 
avoid the duplicated definitions of {{union()}}. Where we should be able to 
just have one varargs method, we have up to 3 forms:
 - {{union(RDD, Seq/List)}}

 - {{union(RDD*)}}

 - {{union(RDD, RDD*)}}

While this pattern is sometimes used to avoid type collision due to erasure, I 
don't think it applies here.

After cleaning it, we'll have 1 SparkContext and 3 JavaSparkContext methods 
(for the 3 Java RDD types), not 11 methods.

The only difference for callers in Spark 3 would be that {{sc.union(Seq(rdd1, 
rdd2))}} now has to be {{sc.union(rdd1, rdd2)}} (simpler) or 
{{sc.union(Seq(rdd1, rdd2): _*)}}

> Remove JavaSparkContextVarargsWorkaround and standardize union() methods
> ------------------------------------------------------------------------
>
>                 Key: SPARK-25737
>                 URL: https://issues.apache.org/jira/browse/SPARK-25737
>             Project: Spark
>          Issue Type: Task
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>
> In ancient times in 2013, JavaSparkContext got a superclass 
> JavaSparkContextVarargsWorkaround to deal with some Scala 2.7 issue: 
> [http://www.scala-archive.org/Workaround-for-implementing-java-varargs-in-2-7-2-final-td1944767.html#a1944772]
> I believe this was really resolved by the {{@varags}} annotation in Scala 
> 2.9. 
> I believe we can now remove this workaround. Along the way, I think we can 
> also avoid the duplicated definitions of {{union()}}. Where we should be able 
> to just have one varargs method, we have up to 3 forms:
>  - {{union(RDD, Seq/List)}}
>  - {{union(RDD*)}}
>  - {{union(RDD, RDD*)}}
> While this pattern is sometimes used to avoid type collision due to erasure, 
> I don't think it applies here.
> After cleaning it, we'll have 1 SparkContext and 3 JavaSparkContext methods 
> (for the 3 Java RDD types), not 11 methods.
> The only difference for callers in Spark 3 would be that {{sc.union(Seq(rdd1, 
> rdd2))}} now has to be {{sc.union(rdd1, rdd2)}} (simpler) or 
> {{sc.union(Seq(rdd1, rdd2): _*)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to