[jira] [Created] (SPARK-21021) Reading partitioned parquet does not respect specified schema column order

Michel Lemay (JIRA) Thu, 08 Jun 2017 05:27:33 -0700

Michel Lemay created SPARK-21021:
------------------------------------

             Summary: Reading partitioned parquet does not respect specified 
schema column order
                 Key: SPARK-21021
                 URL: https://issues.apache.org/jira/browse/SPARK-21021
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.1.0
            Reporter: Michel Lemay
            Priority: Minor



When reading back a partitioned parquet folder, column order gets messed up.

Consider the following example:

{code:scala}
case class Event(f1: String, f2: String, f3: String)
val df = Seq(Event("v1", "v2", "v3")).toDF
df.write.partitionBy("f1", "f2").parquet("out")

val schema: StructType = StructType(StructField("f1", StringType, true) :: 
StructField("f2", StringType, true) :: StructField("f3", StringType, true) :: 
Nil)
val dfRead = spark.read.schema(schema).parquet("out")

dfRead.show
+---+---+---+
| f3| f1| f2|
+---+---+---+
| v3| v1| v2|
+---+---+---+

dfRead.columns
Array[String] = Array(f3, f1, f2)

schema.fields
Array(StructField(f1,StringType,true), StructField(f2,StringType,true), 
StructField(f3,StringType,true))
{code}

This makes it really hard to have compatible schema when reading from multiple 
sources.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-21021) Reading partitioned parquet does not respect specified schema column order

Reply via email to