[ https://issues.apache.org/jira/browse/SPARK-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheng Lian updated SPARK-19887: ------------------------------- Description: The following Spark shell snippet under Spark 2.1 reproduces this issue: {code} val data = Seq( ("p1", 1, 1), ("p2", 2, 2), (null, 3, 3) ) // Correct case: Saving partitioned data to file system. val path = "/tmp/partitioned" data. toDF("a", "b", "c"). write. mode("overwrite"). partitionBy("a", "b"). parquet(path) spark.read.parquet(path).filter($"a".isNotNull).show(truncate = false) // +---+---+---+ // |c |a |b | // +---+---+---+ // |2 |p2 |2 | // |1 |p1 |1 | // +---+---+---+ // Incorrect case: Saving partitioned data as persisted table. data. toDF("a", "b", "c"). write. mode("overwrite"). partitionBy("a", "b"). saveAsTable("test_null") spark.table("test_null").filter($"a".isNotNull).show(truncate = false) // +---+--------------------------+---+ // |c |a |b | // +---+--------------------------+---+ // |3 |__HIVE_DEFAULT_PARTITION__|3 | <-- This line should not be here // |1 |p1 |1 | // |2 |p2 |2 | // +---+--------------------------+---+ {code} Hive-style partitioned tables use the magic string {{\_\_HIVE_DEFAULT_PARTITION\_\_}} to indicate {{NULL}} partition values in partition directory names. However, in the case persisted partitioned table, this magic string is not interpreted as {{NULL}} but a regular string. was: The following Spark shell snippet under Spark 2.1 reproduces this issue: {code} val data = Seq( ("p1", 1, 1), ("p2", 2, 2), (null, 3, 3) ) // Correct case: Saving partitioned data to file system. val path = "/tmp/partitioned" data. toDF("a", "b", "c"). write. mode("overwrite"). partitionBy("a", "b"). parquet(path) spark.read.parquet(path).filter($"a".isNotNull).show(truncate = false) // +---+---+---+ // |c |a |b | // +---+---+---+ // |2 |p2 |2 | // |1 |p1 |1 | // +---+---+---+ // Incorrect case: Saving partitioned data as persisted table. data. toDF("a", "b", "c"). write. mode("overwrite"). partitionBy("a", "b"). saveAsTable("test_null") spark.table("test_null").filter($"a".isNotNull).show(truncate = false) // +---+--------------------------+---+ // |c |a |b | // +---+--------------------------+---+ // |3 |__HIVE_DEFAULT_PARTITION__|3 | <-- This line should not be here // |1 |p1 |1 | // |2 |p2 |2 | // +---+--------------------------+---+ {code} Hive-style partitioned tables use the magic string {{__HIVE_DEFAULT_PARTITION__}} to indicate {{NULL}} partition values in partition directory names. However, in the case persisted partitioned table, this magic string is not interpreted as {{NULL}} but a regular string. > __HIVE_DEFAULT_PARTITION__ is not interpreted as NULL partition value in > partitioned persisted tables > ----------------------------------------------------------------------------------------------------- > > Key: SPARK-19887 > URL: https://issues.apache.org/jira/browse/SPARK-19887 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.0 > Reporter: Cheng Lian > > The following Spark shell snippet under Spark 2.1 reproduces this issue: > {code} > val data = Seq( > ("p1", 1, 1), > ("p2", 2, 2), > (null, 3, 3) > ) > // Correct case: Saving partitioned data to file system. > val path = "/tmp/partitioned" > data. > toDF("a", "b", "c"). > write. > mode("overwrite"). > partitionBy("a", "b"). > parquet(path) > spark.read.parquet(path).filter($"a".isNotNull).show(truncate = false) > // +---+---+---+ > // |c |a |b | > // +---+---+---+ > // |2 |p2 |2 | > // |1 |p1 |1 | > // +---+---+---+ > // Incorrect case: Saving partitioned data as persisted table. > data. > toDF("a", "b", "c"). > write. > mode("overwrite"). > partitionBy("a", "b"). > saveAsTable("test_null") > spark.table("test_null").filter($"a".isNotNull).show(truncate = false) > // +---+--------------------------+---+ > // |c |a |b | > // +---+--------------------------+---+ > // |3 |__HIVE_DEFAULT_PARTITION__|3 | <-- This line should not be here > // |1 |p1 |1 | > // |2 |p2 |2 | > // +---+--------------------------+---+ > {code} > Hive-style partitioned tables use the magic string > {{\_\_HIVE_DEFAULT_PARTITION\_\_}} to indicate {{NULL}} partition values in > partition directory names. However, in the case persisted partitioned table, > this magic string is not interpreted as {{NULL}} but a regular string. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org