[ 
https://issues.apache.org/jira/browse/IMPALA-9139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971347#comment-16971347
 ] 

Vihang Karajgaonkar commented on IMPALA-9139:
---------------------------------------------

Ah, I missed that. You are right. I will close this JIRA as not a bug.

> Invalidate metadata adds all the tables to background loading pool 
> unnecessarily
> --------------------------------------------------------------------------------
>
>                 Key: IMPALA-9139
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9139
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Vihang Karajgaonkar
>            Priority: Major
>
> I see the following code in the reset() method of CatalogServiceCatalog
> {code:java}
>       // Build a new DB cache, populate it, and replace the existing cache in 
> one
>       // step.
>       Map<String, Db> newDbCache = new ConcurrentHashMap<String, Db>();
>       List<TTableName> tblsToBackgroundLoad = new ArrayList<>();
>       try (MetaStoreClient msClient = getMetaStoreClient()) {
>         List<String> allDbs = msClient.getHiveClient().getAllDatabases();
>         int numComplete = 0;
>         for (String dbName: allDbs) {
>           if (isBlacklistedDb(dbName)) {
>             LOG.info("skip blacklisted db: " + dbName);
>             continue;
>           }
>           String annotation = String.format("invalidating metadata - %s/%s 
> dbs complete",
>               numComplete++, allDbs.size());
>           try (ThreadNameAnnotator tna = new ThreadNameAnnotator(annotation)) 
> {
>             dbName = dbName.toLowerCase();
>             Db oldDb = oldDbCache.get(dbName);
>             Pair<Db, List<TTableName>> invalidatedDb = invalidateDb(msClient,
>                 dbName, oldDb);
>             if (invalidatedDb == null) continue;
>             newDbCache.put(dbName, invalidatedDb.first);
>             tblsToBackgroundLoad.addAll(invalidatedDb.second);
>           }
>         }
>       }
>       dbCache_.set(newDbCache);
>       // Identify any deleted databases and add them to the delta log.
>       Set<String> oldDbNames = oldDbCache.keySet();
>       Set<String> newDbNames = newDbCache.keySet();
>       oldDbNames.removeAll(newDbNames);
>       for (String dbName: oldDbNames) {
>         Db removedDb = oldDbCache.get(dbName);
>         updateDeleteLog(removedDb);
>       }
>       // Submit tables for background loading.
>       for (TTableName tblName: tblsToBackgroundLoad) {
>         tableLoadingMgr_.backgroundLoad(tblName);
>       }
> {code}
> If you notice above, the tables are being added to the backgroundLoad with 
> checking the flag {{loadInBackground_}}. This means that even if the flag is 
> unset, after we issue a invalidate metadata command, all the tables in the 
> system are being loaded in the background. Note that this code is only 
> loading the tables, not adding the loaded tables to the catalog which is good 
> otherwise the memory footprint of catalog would be increased after every 
> invalidate metadata command.
> This bug has 2 implications:
> 1. We are obviously wasting a lot of cpu cycles without getting anything out 
> of it.
> 2. The more subtle side-effect is that this would fill up the 
> {{tableLoadingDeque_}}. This means any other background load task will take a 
> longer duration to complete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to