Please do! Thanks.
On Fri, Apr 17, 2015 at 2:36 PM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Ok, do you want me to open a pull request to fix the dedicated > documentation ? > > Le ven. 17 avr. 2015 à 18:14, Reynold Xin <r...@databricks.com> a écrit : > >> I think in 1.3 and above, you'd need to do >> >> .sql(...).javaRDD().map(..) >> >> On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot < >> o.girar...@lateral-thoughts.com> wrote: >> >>> Yes thanks ! >>> >>> Le ven. 17 avr. 2015 à 16:20, Ted Yu <yuzhih...@gmail.com> a écrit : >>> >>> > The image didn't go through. >>> > >>> > I think you were referring to: >>> > override def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f) >>> > >>> > Cheers >>> > >>> > On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot < >>> > o.girar...@lateral-thoughts.com> wrote: >>> > >>> > > Hi everyone, >>> > > I had an issue trying to use Spark SQL from Java (8 or 7), I tried to >>> > > reproduce it in a small test case close to the actual documentation >>> > > < >>> > >>> https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection >>> > >, >>> > > so sorry for the long mail, but this is "Java" : >>> > > >>> > > import org.apache.spark.api.java.JavaRDD; >>> > > import org.apache.spark.api.java.JavaSparkContext; >>> > > import org.apache.spark.sql.DataFrame; >>> > > import org.apache.spark.sql.SQLContext; >>> > > >>> > > import java.io.Serializable; >>> > > import java.util.ArrayList; >>> > > import java.util.Arrays; >>> > > import java.util.List; >>> > > >>> > > class Movie implements Serializable { >>> > > private int id; >>> > > private String name; >>> > > >>> > > public Movie(int id, String name) { >>> > > this.id = id; >>> > > this.name = name; >>> > > } >>> > > >>> > > public int getId() { >>> > > return id; >>> > > } >>> > > >>> > > public void setId(int id) { >>> > > this.id = id; >>> > > } >>> > > >>> > > public String getName() { >>> > > return name; >>> > > } >>> > > >>> > > public void setName(String name) { >>> > > this.name = name; >>> > > } >>> > > } >>> > > >>> > > public class SparkSQLTest { >>> > > public static void main(String[] args) { >>> > > SparkConf conf = new SparkConf(); >>> > > conf.setAppName("My Application"); >>> > > conf.setMaster("local"); >>> > > JavaSparkContext sc = new JavaSparkContext(conf); >>> > > >>> > > ArrayList<Movie> movieArrayList = new ArrayList<Movie>(); >>> > > movieArrayList.add(new Movie(1, "Indiana Jones")); >>> > > >>> > > JavaRDD<Movie> movies = sc.parallelize(movieArrayList); >>> > > >>> > > SQLContext sqlContext = new SQLContext(sc); >>> > > DataFrame frame = sqlContext.applySchema(movies, >>> Movie.class); >>> > > frame.registerTempTable("movies"); >>> > > >>> > > sqlContext.sql("select name from movies") >>> > > >>> > > * .map(row -> row.getString(0)) // this is what i >>> would >>> > expect to work * .collect(); >>> > > } >>> > > } >>> > > >>> > > >>> > > But this does not compile, here's the compilation error : >>> > > >>> > > [ERROR] >>> > > >>> > >>> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47] >>> > > method map in class org.apache.spark.sql.DataFrame cannot be applied >>> to >>> > > given types; >>> > > [ERROR] *required: >>> > > >>> scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R> * >>> > > [ERROR]* found: (row)->"Na[...]ng(0) * >>> > > [ERROR] *reason: cannot infer type-variable(s) R * >>> > > [ERROR] *(actual and formal argument lists differ in length) * >>> > > [ERROR] >>> > > >>> > >>> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17] >>> > > method map in class org.apache.spark.sql.DataFrame cannot be applied >>> to >>> > > given types; >>> > > [ERROR] required: >>> > > scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R> >>> > > [ERROR] found: (row)->row[...]ng(0) >>> > > [ERROR] reason: cannot infer type-variable(s) R >>> > > [ERROR] (actual and formal argument lists differ in length) >>> > > [ERROR] -> [Help 1] >>> > > >>> > > Because in the DataFrame the *map *method is defined as : >>> > > >>> > > [image: Images intégrées 1] >>> > > >>> > > And once this is translated to bytecode the actual Java signature >>> uses a >>> > > Function1 and adds a ClassTag parameter. >>> > > I can try to go around this and use the scala.reflect.ClassTag$ like >>> > that : >>> > > >>> > > ClassTag$.MODULE$.apply(String.class) >>> > > >>> > > To get the second ClassTag parameter right, but then instantiating a >>> > java.util.Function or using the Java 8 lambdas fail to work, and if I >>> try >>> > to instantiate a proper scala Function1... well this is a world of >>> pain. >>> > > >>> > > This is a regression introduced by the 1.3.x DataFrame because >>> > JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are >>> not >>> > callable with JFunctions), I can open a Jira if you want ? >>> > > >>> > > Regards, >>> > > >>> > > -- >>> > > *Olivier Girardot* | Associé >>> > > o.girar...@lateral-thoughts.com >>> > > +33 6 24 09 17 94 >>> > > >>> > >>> >> >>