Quanlong Huang created IMPALA-13863:
---------------------------------------
Summary: Show number of loaded tables in metrics
Key: IMPALA-13863
URL: https://issues.apache.org/jira/browse/IMPALA-13863
Project: IMPALA
Issue Type: Improvement
Components: Catalog
Reporter: Quanlong Huang
Assignee: Quanlong Huang
It'd be helpful to show the number of loaded tables (i.e. not IncompleteTable)
in catalogd since there are some mechanisms that will implicitly invalidate
tables, e.g. invalidate_tables_on_memory_pressure, invalidate_tables_timeout_s,
invalidate_metadata_on_event_processing_failure.
If few tables are actually loaded, it will impact query performance that many
queries will be in the CREATED state waiting for catalogd to load the metadata
of their tables. We should tune catalogd, e.g. bumping JVM heap size, for this.
There are several places that we can track the total number of loaded tables:
# While catalogd is collecting catalog updates in getCatalogDelta(), it
iterates through all the tables and can count this. However, it takes time and
some tables might change the state during the iteration.
# When a table is loaded and replaces an IncompleteTable, we bumps the count.
And decrease the count when a loaded table is invalidated.
The 2nd option can show the real time count in metrics. The 1st option can be
used to improve logging, e.g. add a log saying "saw N tables are loaded".
--
This message was sent by Atlassian Jira
(v8.20.10#820010)