[
https://issues.apache.org/jira/browse/IMPALA-9139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971035#comment-16971035
]
Jiawei Wang commented on IMPALA-9139:
-------------------------------------
cc [~stigahuang], Please correct me if I am wrong.
So I read the code here...It seems like the logic here is correct. It's
definitely confusing if we did not dive into the code... I will add some
comments here in the IMPALA-9110 patch.
The flag checking loadInBackground_ actually happens... It's just happened in
invalidateDb() function... So it seems like this invalidateDb() function
returns all the tables under a DB that need to be reloaded.
{code:java}
Pair<Db, List<TTableName>> invalidatedDb = invalidateDb(msClient,
dbName, oldDb);
if (invalidatedDb == null) continue;
newDbCache.put(dbName, invalidatedDb.first);
tblsToBackgroundLoad.addAll(invalidatedDb.second);{code}
In that function. It checks whether we need to reload tables here:
[https://github.com/apache/impala/blob/f9cf70d0352196e7b1b8465d6507ae4b16a3e82c/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L1421]
So if the loadInBackground_ is set to false, this method invalidateDb() will
always return null. So the background loading in reset() actually did not add
any tables to the tableLoadingDeque_...
> Invalidate metadata adds all the tables to background loading pool
> unnecessarily
> --------------------------------------------------------------------------------
>
> Key: IMPALA-9139
> URL: https://issues.apache.org/jira/browse/IMPALA-9139
> Project: IMPALA
> Issue Type: Bug
> Reporter: Vihang Karajgaonkar
> Priority: Major
>
> I see the following code in the reset() method of CatalogServiceCatalog
> {code:java}
> // Build a new DB cache, populate it, and replace the existing cache in
> one
> // step.
> Map<String, Db> newDbCache = new ConcurrentHashMap<String, Db>();
> List<TTableName> tblsToBackgroundLoad = new ArrayList<>();
> try (MetaStoreClient msClient = getMetaStoreClient()) {
> List<String> allDbs = msClient.getHiveClient().getAllDatabases();
> int numComplete = 0;
> for (String dbName: allDbs) {
> if (isBlacklistedDb(dbName)) {
> LOG.info("skip blacklisted db: " + dbName);
> continue;
> }
> String annotation = String.format("invalidating metadata - %s/%s
> dbs complete",
> numComplete++, allDbs.size());
> try (ThreadNameAnnotator tna = new ThreadNameAnnotator(annotation))
> {
> dbName = dbName.toLowerCase();
> Db oldDb = oldDbCache.get(dbName);
> Pair<Db, List<TTableName>> invalidatedDb = invalidateDb(msClient,
> dbName, oldDb);
> if (invalidatedDb == null) continue;
> newDbCache.put(dbName, invalidatedDb.first);
> tblsToBackgroundLoad.addAll(invalidatedDb.second);
> }
> }
> }
> dbCache_.set(newDbCache);
> // Identify any deleted databases and add them to the delta log.
> Set<String> oldDbNames = oldDbCache.keySet();
> Set<String> newDbNames = newDbCache.keySet();
> oldDbNames.removeAll(newDbNames);
> for (String dbName: oldDbNames) {
> Db removedDb = oldDbCache.get(dbName);
> updateDeleteLog(removedDb);
> }
> // Submit tables for background loading.
> for (TTableName tblName: tblsToBackgroundLoad) {
> tableLoadingMgr_.backgroundLoad(tblName);
> }
> {code}
> If you notice above, the tables are being added to the backgroundLoad with
> checking the flag {{loadInBackground_}}. This means that even if the flag is
> unset, after we issue a invalidate metadata command, all the tables in the
> system are being loaded in the background. Note that this code is only
> loading the tables, not adding the loaded tables to the catalog which is good
> otherwise the memory footprint of catalog would be increased after every
> invalidate metadata command.
> This bug has 2 implications:
> 1. We are obviously wasting a lot of cpu cycles without getting anything out
> of it.
> 2. The more subtle side-effect is that this would fill up the
> {{tableLoadingDeque_}}. This means any other background load task will take a
> longer duration to complete.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]