ayushtkn commented on code in PR #4897:
URL: https://github.com/apache/hive/pull/4897#discussion_r1403921601
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##########
@@ -850,12 +851,44 @@ public void
executeOperation(org.apache.hadoop.hive.ql.metadata.Table hmsTable,
IcebergTableUtil.performMetadataDelete(icebergTable,
deleteMetadataSpec.getBranchName(),
deleteMetadataSpec.getSarg());
break;
+ case DELETE_ORPHAN_FILES:
Review Comment:
Theoretically yes, it has to store Path strings. I am not sure how much
memory that can take, but if you see Hive Replication, It launches Distcp per
table, so that does listing for each table in the Hs2 only, so that doesn't
choke, & launch MR jobs for copy, not for listing.
I did some benchmarking stuff there, so for ~3.8 million it was taking about
250-300 mb, but there it stores some more stuff as well, so here it should be
less.
Yep, but we can explore having a Tez job, I don't have a clear idea how we
can get it done via that route, but I will create a follow-up & discuss with
folks & figure out a way :-)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]