[ https://issues.apache.org/jira/browse/SPARK-39806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ala Luszczak updated SPARK-39806: --------------------------------- Description: There is a problem with a projection we use in `FileScanRDD` to join the metadata row to the row produced by the reader. https://github.com/apache/spark/blob/e4ca8424474e571d8e137388fe5d54732b68c2f3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala#L128-L133 The issue is that the projection omits partition columns. As a result, the expressions down the line return a malformed row. The errors crash the query, but the exact message can vary (for example: failed assertion on number of fields in the row, accessing field of incorrect type). This defect affects only readers producing rows, and only data sets using dynamic partitioning. was: There is a problem with a projection we use in `FileScanRDD` to join the metadata row to the row produced by the reader. https://github.com/apache/spark/blob/e4ca8424474e571d8e137388fe5d54732b68c2f3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala#L128-L133 The issue is that the projection omits partition columns. As a result, the expressions down the line return a malformed row. The errors crash the query, but the exact message can vary (for example: failed assertion on number of fields in the row, accessing field of incorrect type). This defect affect only readers producing rows, and only data sets using dynamic partitioning. > Queries accessing METADATA struct crash on partitioned tables > ------------------------------------------------------------- > > Key: SPARK-39806 > URL: https://issues.apache.org/jira/browse/SPARK-39806 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.3.0 > Reporter: Ala Luszczak > Priority: Major > > There is a problem with a projection we use in `FileScanRDD` to join the > metadata row to the row produced by the reader. > https://github.com/apache/spark/blob/e4ca8424474e571d8e137388fe5d54732b68c2f3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala#L128-L133 > The issue is that the projection omits partition columns. As a result, the > expressions down the line return a malformed row. The errors crash the query, > but the exact message can vary (for example: failed assertion on number of > fields in the row, accessing field of incorrect type). > This defect affects only readers producing rows, and only data sets using > dynamic partitioning. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org