Neer393 commented on code in PR #6020: URL: https://github.com/apache/hive/pull/6020#discussion_r2278767539
########## standalone-metastore/metastore-client/src/main/java/org/apache/hadoop/hive/metastore/utils/TableFetcher.java: ########## @@ -102,21 +104,47 @@ public List<TableName> getTables() throws Exception { List<String> databases = client.getDatabases(catalogName, dbPattern); for (String db : databases) { - Database database = client.getDatabase(catalogName, db); - if (MetaStoreUtils.checkIfDbNeedsToBeSkipped(database)) { - LOG.debug("Skipping table under database: {}", db); - continue; - } - if (MetaStoreUtils.isDbBeingPlannedFailedOver(database)) { - LOG.info("Skipping table that belongs to database {} being failed over.", db); - continue; - } - List<String> tablesNames = client.listTableNamesByFilter(catalogName, db, tableFilter, -1); + List<String> tablesNames = getTableNamesForDatabase(catalogName, db); tablesNames.forEach(tablesName -> candidates.add(TableName.fromString(tablesName, catalogName, db))); } return candidates; } + public List<Table> getTables(int maxBatchSize) throws Exception { + List<Table> candidates = new ArrayList<>(); + + // if tableTypes is empty, then a list with single empty string has to specified to scan no tables. + if (tableTypes.isEmpty()) { + LOG.info("Table fetcher returns empty list as no table types specified"); + return candidates; + } + + List<String> databases = client.getDatabases(catalogName, dbPattern); + + for (String db : databases) { + List<String> tablesNames = getTableNamesForDatabase(catalogName, db); Review Comment: The earlier implementation had one msc call for getting table names and then one msc call each for getting the HMS table object for each table name. The newer implementation reduces the msc calls in a way that one msc call is made for getting all table names and then using TableIterable, the number of msc calls for getting table objects becomes ```Number of tables / (BATCH_MAX_RETRIEVE config value [Default is 300])``` So in the older implementation ```number of msc calls = 1 + number of tables``` whereas in the newer implementation ```number of msc calls = 1 + (number of tables / [BATCH_MAX_RETRIEVE])``` This was suggested by @vikramahuja1001 where my previous implementation was dropped where I had implemented direct HMS API endpoint like ```listTableNamesByFilter``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org