Hi everyone, I had an issue trying to use Spark SQL from Java (8 or 7), I tried to reproduce it in a small test case close to the actual documentation <https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection>, so sorry for the long mail, but this is "Java" :
import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.sql.DataFrame; import org.apache.spark.sql.SQLContext; import java.io.Serializable; import java.util.ArrayList; import java.util.Arrays; import java.util.List; class Movie implements Serializable { private int id; private String name; public Movie(int id, String name) { this.id = id; this.name = name; } public int getId() { return id; } public void setId(int id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } } public class SparkSQLTest { public static void main(String[] args) { SparkConf conf = new SparkConf(); conf.setAppName("My Application"); conf.setMaster("local"); JavaSparkContext sc = new JavaSparkContext(conf); ArrayList<Movie> movieArrayList = new ArrayList<Movie>(); movieArrayList.add(new Movie(1, "Indiana Jones")); JavaRDD<Movie> movies = sc.parallelize(movieArrayList); SQLContext sqlContext = new SQLContext(sc); DataFrame frame = sqlContext.applySchema(movies, Movie.class); frame.registerTempTable("movies"); sqlContext.sql("select name from movies") * .map(row -> row.getString(0)) // this is what i would expect to work * .collect(); } } But this does not compile, here's the compilation error : [ERROR] /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47] method map in class org.apache.spark.sql.DataFrame cannot be applied to given types; [ERROR] *required: scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R> * [ERROR]* found: (row)->"Na[...]ng(0) * [ERROR] *reason: cannot infer type-variable(s) R * [ERROR] *(actual and formal argument lists differ in length) * [ERROR] /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17] method map in class org.apache.spark.sql.DataFrame cannot be applied to given types; [ERROR] required: scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R> [ERROR] found: (row)->row[...]ng(0) [ERROR] reason: cannot infer type-variable(s) R [ERROR] (actual and formal argument lists differ in length) [ERROR] -> [Help 1] Because in the DataFrame the *map *method is defined as : [image: Images intégrées 1] And once this is translated to bytecode the actual Java signature uses a Function1 and adds a ClassTag parameter. I can try to go around this and use the scala.reflect.ClassTag$ like that : ClassTag$.MODULE$.apply(String.class) To get the second ClassTag parameter right, but then instantiating a java.util.Function or using the Java 8 lambdas fail to work, and if I try to instantiate a proper scala Function1... well this is a world of pain. This is a regression introduced by the 1.3.x DataFrame because JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are not callable with JFunctions), I can open a Jira if you want ? Regards, -- *Olivier Girardot* | Associé o.girar...@lateral-thoughts.com +33 6 24 09 17 94