[
https://issues.apache.org/jira/browse/HIVE-18705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461004#comment-16461004
]
Peter Vary commented on HIVE-18705:
-----------------------------------
{quote}+So here's a question+: should I get rid of the batched scenario as all
the tables are queried and are accessible at a time already, and there's little
reason for me to query them in batches later (for memory reasons) instead of
all of them at once. This way I could have the non-batched (send one dropDB
only) scenario only which doesn't suffer from all the slowing effects I
described above, and is generally 4-5 times faster than the current
implementation.
{quote}
As we discussed offline I think we should keep the batched scenario. There are
constant memory problems, and we should strive to remove places from code
where we query every table/partition to the memory, not introducing new ones :).
Also it would be good idea to check if it is possible to shorten the closure
time for the DFSClient.
> Improve HiveMetaStoreClient.dropDatabase
> ----------------------------------------
>
> Key: HIVE-18705
> URL: https://issues.apache.org/jira/browse/HIVE-18705
> Project: Hive
> Issue Type: Improvement
> Reporter: Adam Szita
> Assignee: Adam Szita
> Priority: Major
> Attachments: HIVE-18705.0.patch, HIVE-18705.1.patch,
> HIVE-18705.2.patch, HIVE-18705.4.patch
>
>
> {{HiveMetaStoreClient.dropDatabase}} has a strange implementation to ensure
> dealing with client side hooks (for non-native tables e.g. HBase). Currently
> it starts by retrieving all the tables from HMS, and then sends {{dropTable}}
> calls to HMS table-by-table. At the end a {{dropDatabase}} just to be sure :)
> I believe this could be refactored so that it speeds up the dropDB in
> situations where the average table count per DB is very high.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)