See the comment for createDataFrame(rowRDD: RDD[Row], schema: StructType)
method:

   * Creates a [[DataFrame]] from an [[RDD]] containing [[Row]]s using the
given schema.
   * It is important to make sure that the structure of every [[Row]] of
the provided RDD matches
   * the provided schema. Otherwise, there will be runtime exception.
   * Example:
   * {{{
   *  import org.apache.spark.sql._
   *  import org.apache.spark.sql.types._
   *  val sqlContext = new org.apache.spark.sql.SQLContext(sc)
   *
   *  val schema =
   *    StructType(
   *      StructField("name", StringType, false) ::
   *      StructField("age", IntegerType, true) :: Nil)
   *
   *  val people =
   *    sc.textFile("examples/src/main/resources/people.txt").map(
   *      _.split(",")).map(p => Row(p(0), p(1).trim.toInt))
   *  val dataFrame = sqlContext.createDataFrame(people, schema)
   *  dataFrame.printSchema
   *  // root
   *  // |-- name: string (nullable = false)
   *  // |-- age: integer (nullable = true)

Cheers

On Sun, Dec 20, 2015 at 6:31 AM, Eran Witkon <eranwit...@gmail.com> wrote:

> Hi,
>
> I have an RDD
> jsonGzip
> res3: org.apache.spark.rdd.RDD[(String, String, String, String)] =
> MapPartitionsRDD[8] at map at <console>:65
>
> which I want to convert to a DataFrame with schema
> so I created a schema:
>
> al schema =
>   StructType(
>     StructField("cty", StringType, false) ::
>       StructField("hse", StringType, false) ::
>         StructField("nm", StringType, false) ::
>           StructField("yrs", StringType, false) ::Nil)
>
> and called
>
> val unzipJSON = sqlContext.createDataFrame(jsonGzip,schema)
> <console>:36: error: overloaded method value createDataFrame with 
> alternatives:
>   (rdd: org.apache.spark.api.java.JavaRDD[_],beanClass: 
> Class[_])org.apache.spark.sql.DataFrame <and>
>   (rdd: org.apache.spark.rdd.RDD[_],beanClass: 
> Class[_])org.apache.spark.sql.DataFrame <and>
>   (rowRDD: 
> org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema: 
> org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
>   (rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema: 
> org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
>  cannot be applied to (org.apache.spark.rdd.RDD[(String, String, String, 
> String)], org.apache.spark.sql.types.StructType)
>        val unzipJSON = sqlContext.createDataFrame(jsonGzip,schema)
>
>
> But as you see I don't have the right RDD type.
>
> So how cane I get the a dataframe with the right column names?
>
>
>

Reply via email to