[GitHub] [iceberg] vinnielhj opened a new issue, #8425: spark-procedures migrating tables can pose fatal problems

via GitHub Tue, 29 Aug 2023 22:50:23 -0700


vinnielhj opened a new issue, #8425:
URL: https://github.com/apache/iceberg/issues/8425


   ### Apache Iceberg version
   
   1.3.1 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Environment: 
   spark3.2.1,iceberg1.3.1,org.apache.iceberg.spark.SparkSessionCatalog
   desc:
   I have a Hive table test.sample, I execute now
   CALL spark_catalog.system.migrate(table => 'test.sample')
   Migrate it as an Iceberg table. At this time, there are two directories 
test.db/sample/ and test.db/sample_backup_ on the file system, and show tables 
can also see that there are two tables test.sample and test.sample_backup_. 
When I migrate and check the data is correct, I may need to delete the backup 
table test.sample_backup_. When I execute DROP TABLE test.sample_backup_, the 
table is deleted after the execution is completed, and at the same time the 
file system test.db/sample_backup_ directory will be deleted. When I query the 
iceberg table again, test.sample will throw an error that the file cannot be 
found, because the file cannot be indexed. I don't think this is reasonable, 
just removing the meta information and not the data files is a good way of 
doing things in my opinion.
   
   Steps:
   1. CREATE TABLE test.sample (id bigint COMMENT 'unique id', data string) 
stored as parquet;
   2.CALL spark_catalog.system.migrate(table => 'test.sample');
   3.drop table test.sample_backup_
   4. select * from test.sample


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] vinnielhj opened a new issue, #8425: spark-procedures migrating tables can pose fatal problems

Reply via email to