[
https://issues.apache.org/jira/browse/SPARK-34027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maxim Gekk updated SPARK-34027:
-------------------------------
Description:
Here is the example to reproduce the issue:
{code:sql}
spark-sql> create table tbl (col int, part int) using parquet partitioned by
(part);
spark-sql> insert into tbl partition (part=0) select 0;
spark-sql> cache table tbl;
spark-sql> select * from tbl;
0 0
spark-sql> show table extended like 'tbl' partition(part=0);
default tbl false Partition Values: [part=0]
Location:
file:/Users/maximgekk/proj/recover-partitions-refresh-cache/spark-warehouse/tbl/part=0
Serde Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Storage Properties: [serialization.format=1,
path=file:/Users/maximgekk/proj/recover-partitions-refresh-cache/spark-warehouse/tbl]
Partition Parameters: {transient_lastDdlTime=1609929762, totalSize=424,
numFiles=1}
Created Time: Wed Jan 06 13:42:42 MSK 2021
Last Access: UNKNOWN
Partition Statistics: 424 bytes
{code}
Add new partition by copying the existing one:
{code}
cp -r
/Users/maximgekk/proj/recover-partitions-refresh-cache/spark-warehouse/tbl/part=0
/Users/maximgekk/proj/recover-partitions-refresh-cache/spark-warehouse/tbl/part=1
{code}
Recover and select the table:
{code}
spark-sql> alter table tbl recover partitions;
spark-sql> select * from tbl;
0 0
{code}
We see only old data.
was:
Here is the example to reproduce the issue:
{code:sql}
spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED BY
(part0);
spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
spark-sql> CACHE TABLE tbl1;
spark-sql> SELECT * FROM tbl1;
0 0
1 1
spark-sql> ALTER TABLE tbl1 PARTITION (part0 = 0) RENAME TO PARTITION (part0 =
2);
spark-sql> SELECT * FROM tbl1;
0 0
1 1
{code}
> ALTER TABLE .. RECOVER PARTITIONS doesn't refresh cache
> -------------------------------------------------------
>
> Key: SPARK-34027
> URL: https://issues.apache.org/jira/browse/SPARK-34027
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.0.1, 3.1.0, 3.2.0
> Reporter: Maxim Gekk
> Assignee: Maxim Gekk
> Priority: Major
> Labels: correctness
> Fix For: 3.2.0
>
>
> Here is the example to reproduce the issue:
> {code:sql}
> spark-sql> create table tbl (col int, part int) using parquet partitioned by
> (part);
> spark-sql> insert into tbl partition (part=0) select 0;
> spark-sql> cache table tbl;
> spark-sql> select * from tbl;
> 0 0
> spark-sql> show table extended like 'tbl' partition(part=0);
> default tbl false Partition Values: [part=0]
> Location:
> file:/Users/maximgekk/proj/recover-partitions-refresh-cache/spark-warehouse/tbl/part=0
> Serde Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> Storage Properties: [serialization.format=1,
> path=file:/Users/maximgekk/proj/recover-partitions-refresh-cache/spark-warehouse/tbl]
> Partition Parameters: {transient_lastDdlTime=1609929762, totalSize=424,
> numFiles=1}
> Created Time: Wed Jan 06 13:42:42 MSK 2021
> Last Access: UNKNOWN
> Partition Statistics: 424 bytes
> {code}
> Add new partition by copying the existing one:
> {code}
> cp -r
> /Users/maximgekk/proj/recover-partitions-refresh-cache/spark-warehouse/tbl/part=0
>
> /Users/maximgekk/proj/recover-partitions-refresh-cache/spark-warehouse/tbl/part=1
> {code}
> Recover and select the table:
> {code}
> spark-sql> alter table tbl recover partitions;
> spark-sql> select * from tbl;
> 0 0
> {code}
> We see only old data.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]