See the comment for createDataFrame(rowRDD: RDD[Row], schema: StructType) method:
* Creates a [[DataFrame]] from an [[RDD]] containing [[Row]]s using the given schema. * It is important to make sure that the structure of every [[Row]] of the provided RDD matches * the provided schema. Otherwise, there will be runtime exception. * Example: * {{{ * import org.apache.spark.sql._ * import org.apache.spark.sql.types._ * val sqlContext = new org.apache.spark.sql.SQLContext(sc) * * val schema = * StructType( * StructField("name", StringType, false) :: * StructField("age", IntegerType, true) :: Nil) * * val people = * sc.textFile("examples/src/main/resources/people.txt").map( * _.split(",")).map(p => Row(p(0), p(1).trim.toInt)) * val dataFrame = sqlContext.createDataFrame(people, schema) * dataFrame.printSchema * // root * // |-- name: string (nullable = false) * // |-- age: integer (nullable = true) Cheers On Sun, Dec 20, 2015 at 6:31 AM, Eran Witkon <eranwit...@gmail.com> wrote: > Hi, > > I have an RDD > jsonGzip > res3: org.apache.spark.rdd.RDD[(String, String, String, String)] = > MapPartitionsRDD[8] at map at <console>:65 > > which I want to convert to a DataFrame with schema > so I created a schema: > > al schema = > StructType( > StructField("cty", StringType, false) :: > StructField("hse", StringType, false) :: > StructField("nm", StringType, false) :: > StructField("yrs", StringType, false) ::Nil) > > and called > > val unzipJSON = sqlContext.createDataFrame(jsonGzip,schema) > <console>:36: error: overloaded method value createDataFrame with > alternatives: > (rdd: org.apache.spark.api.java.JavaRDD[_],beanClass: > Class[_])org.apache.spark.sql.DataFrame <and> > (rdd: org.apache.spark.rdd.RDD[_],beanClass: > Class[_])org.apache.spark.sql.DataFrame <and> > (rowRDD: > org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema: > org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and> > (rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema: > org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame > cannot be applied to (org.apache.spark.rdd.RDD[(String, String, String, > String)], org.apache.spark.sql.types.StructType) > val unzipJSON = sqlContext.createDataFrame(jsonGzip,schema) > > > But as you see I don't have the right RDD type. > > So how cane I get the a dataframe with the right column names? > > >