[ https://issues.apache.org/jira/browse/SPARK-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728808#comment-14728808 ]
Apache Spark commented on SPARK-10428: -------------------------------------- User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/8583 > Struct fields read from parquet are mis-aligned > ----------------------------------------------- > > Key: SPARK-10428 > URL: https://issues.apache.org/jira/browse/SPARK-10428 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0 > Reporter: Yin Huai > Priority: Critical > > {code} > val df1 = sqlContext > .range(1) > .selectExpr("NAMED_STRUCT('a', id, 'd', id + 3) AS s") > .coalesce(1) > val df2 = sqlContext > .range(1, 2) > .selectExpr("NAMED_STRUCT('a', id, 'b', id + 1, 'c', id + 2, 'd', id + 3) > AS s") > .coalesce(1) > df1.write.mode("overwrite").parquet("/home/yin/sc_11_minimal/p=1") > df2.write.mode("overwrite").parquet("/home/yin/sc_11_minimal/p=2") > {code} > {code} > sqlContext.read.option("mergeSchema", > "true").parquet("/home/yin/sc_11_minimal/").selectExpr("s.a", "s.b", "s.c", > "s.d", “p").show > +---+---+----+----+---+ > | a| b| c| d| p| > +---+---+----+----+---+ > | 0| 3|null|null| 1| > | 1| 2| 3| 4| 2| > +---+---+----+----+---+ > {code} > Looks like the problem is at > https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala#L185-L204, > we do padding when global schema has more struct fields than local parquet > file's schema. However, when we read field from parquet, we still use > parquet's local schema and then we put the value of {{d}} to the wrong slot. > I tried master. Looks like this issue is resolved by > https://github.com/apache/spark/pull/8509. We need to decide if we want to > back port that to branch 1.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org