SinghAsDev commented on a change in pull request #3056:
URL: https://github.com/apache/iceberg/pull/3056#discussion_r805396170
##########
File path:
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
##########
@@ -244,10 +244,19 @@ public SparkTable alterTable(Identifier ident,
TableChange... changes) throws No
@Override
public boolean dropTable(Identifier ident) {
+ return dropTableInternal(ident, false);
+ }
+
+ @Override
+ public boolean purgeTable(Identifier ident) {
+ return dropTableInternal(ident, true);
+ }
+
+ private boolean dropTableInternal(Identifier ident, boolean purge) {
try {
return isPathIdentifier(ident) ?
- tables.dropTable(((PathIdentifier) ident).location()) :
- icebergCatalog.dropTable(buildIdentifier(ident));
+ tables.dropTable(((PathIdentifier) ident).location(), purge) :
Review comment:
At Pinterest, we were also running into this issue where hive table
users unexpectedly drop data when they create an iceberg table as `EXTERNAL`
table type on existing data (more details on
https://github.com/apache/iceberg/pull/4018). I think this would be a common
accident among orgs trying out Iceberg. I can see how Netflix did not run into
this issue, based on @rdblue comment, but I don't think all data platforms
using hive tables drop only the references for `drop table` statements.
I think if we change the behavior of `drop table` to not drop any data that
alleviates our concern on accidental drops on external tables. However, it also
means that `drop table` on managed tables would leave data around, which is
also an issue. Spark's documentation also says that data is dropped as part of
`drop table` operation when it is not an external table,
https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-drop-table.html. So,
this behavior of `drop table` not dropping data for managed tables is also
misleading.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]