Maxim Gekk created SPARK-34055:
----------------------------------
Summary: ALTER TABLE .. ADD PARTITION doesn't refresh cache
Key: SPARK-34055
URL: https://issues.apache.org/jira/browse/SPARK-34055
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.0.1, 3.1.0, 3.2.0
Reporter: Maxim Gekk
Assignee: Apache Spark
Fix For: 3.2.0
Here is the example to reproduce the issue:
{code:sql}
spark-sql> create table tbl (col int, part int) using parquet partitioned by
(part);
spark-sql> insert into tbl partition (part=0) select 0;
spark-sql> cache table tbl;
spark-sql> select * from tbl;
0 0
spark-sql> show table extended like 'tbl' partition(part=0);
default tbl false Partition Values: [part=0]
Location:
file:/Users/maximgekk/proj/recover-partitions-refresh-cache/spark-warehouse/tbl/part=0
...
{code}
Add new partition by copying the existing one:
{code}
cp -r
/Users/maximgekk/proj/recover-partitions-refresh-cache/spark-warehouse/tbl/part=0
/Users/maximgekk/proj/recover-partitions-refresh-cache/spark-warehouse/tbl/part=1
{code}
Recover and select the table:
{code}
spark-sql> alter table tbl recover partitions;
spark-sql> select * from tbl;
0 0
{code}
We see only old data.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]