Quanlong Huang created IMPALA-13863:
---------------------------------------

             Summary: Show number of loaded tables in metrics
                 Key: IMPALA-13863
                 URL: https://issues.apache.org/jira/browse/IMPALA-13863
             Project: IMPALA
          Issue Type: Improvement
          Components: Catalog
            Reporter: Quanlong Huang
            Assignee: Quanlong Huang


It'd be helpful to show the number of loaded tables (i.e. not IncompleteTable) 
in catalogd since there are some mechanisms that will implicitly invalidate 
tables, e.g. invalidate_tables_on_memory_pressure, invalidate_tables_timeout_s, 
invalidate_metadata_on_event_processing_failure.

If few tables are actually loaded, it will impact query performance that many 
queries will be in the CREATED state waiting for catalogd to load the metadata 
of their tables. We should tune catalogd, e.g. bumping JVM heap size, for this.

There are several places that we can track the total number of loaded tables:
 # While catalogd is collecting catalog updates in getCatalogDelta(), it 
iterates through all the tables and can count this. However, it takes time and 
some tables might change the state during the iteration.
 # When a table is loaded and replaces an IncompleteTable, we bumps the count. 
And decrease the count when a loaded table is invalidated.

The 2nd option can show the real time count in metrics. The 1st option can be 
used to improve logging, e.g. add a log saying "saw N tables are loaded".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to