[
https://issues.apache.org/jira/browse/HIVE-22702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010482#comment-17010482
]
Peter Vary commented on HIVE-22702:
-----------------------------------
[~michaelchirico]: You did not mention the version of hive you are using, but I
suspect that it does not contain HIVE-6980, HIVE-19783 as we did some
optimization in drop table in these jiras. Since the fixes are not in any
current releases, you might be able to give it a try by building your own hive.
Thanks,
Peter
> ALTER TABLE REMOVE PARTITION is inefficient
> -------------------------------------------
>
> Key: HIVE-22702
> URL: https://issues.apache.org/jira/browse/HIVE-22702
> Project: Hive
> Issue Type: Improvement
> Components: Database/Schema
> Reporter: Michael Chirico
> Priority: Major
>
> I recently realized the poor partitioning of a table of mine was becoming a
> major bottleneck and endeavored to reset the partitioning.
> At this point, the table had about 56K partitions (year|month|day|city)
> combinations; moving to the more efficient year|month partitions means
> there's about 24.
> In the process, I was having trouble fixing the registration of the table
> because of the size of its partition DB; I happened upon this SO Q&A which
> addresses the same issue:
> https://stackoverflow.com/questions/50715939/drop-table-in-hive-via-spark-hangs/50814566#comment105440563_50814566
> I set about batching through ALTER TABLE x DROP PARTITION (...), PARTITION
> (...) 200 at a time; it would run for about 2 hours to accomplish this, which
> strikes me as being quite inefficient.
> (apologies that I haven't done a fully proper analysis of the scaling
> efficiency in this ticket)
> If I were designing it from scratch, I would:
> * Keep the database of existing partitions sorted
> * Sort the incoming partitions to remove
> * Iterate via "shrinking binary search" (each partition is searched with
> binary search, and we can eliminate from the existing DB anything "less than"
> the current index when moving to the next iteration)
> Is there something preventing this from being achieved?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)