[
https://issues.apache.org/jira/browse/IMPALA-8937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong updated IMPALA-8937:
----------------------------------
Priority: Major (was: Critical)
> Fine grained table metadata loading on Catalog server
> -----------------------------------------------------
>
> Key: IMPALA-8937
> URL: https://issues.apache.org/jira/browse/IMPALA-8937
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog, Frontend
> Affects Versions: Impala 2.12.0, Impala 3.3.0
> Reporter: Bharath Vissapragada
> Priority: Major
>
> *Background*:
> Currently the table _on the Catalog server_ is either in a loaded or unloaded
> state (IncompleteTable). When Catalog server starts for the first time, we
> first fetch a list of table names for each databases and every table in this
> list starts as an unloaded table. The table lists are propagated to the
> coordinators so that they know whether a table with a given name exists or
> not and they can start analyzing the queries. No metadata is loaded in the
> incomplete tables (like schema/ownership, comments etc.)
> The table metadata is loaded lazily (and the table moves into a loaded state)
> when it is referenced in any query. When a load request comes in, all the
> table metadata is loaded including file block information.
> *Problem:*
> Coordinators need some additional information when analyzing unloaded tables.
> For example: IMPALA-8228. The ownership information is a part of the HMS
> table schema which is not loaded until the table is marked fully loaded.
> While this is not a problem for regular queries (like select * from <tbl>),
> it is an issue with queries like "show tables" which do not trigger a table
> load. In this particular case, due to the lack of ownership information, the
> output of the table listing could be different depending on whether the table
> is loaded. Another example is IMPALA-8606 where the GET_TABLES request does
> not return the table comments because they are not available for unloaded
> tables.
> *Ask:*
> We need to consider finer grained loading on the Catalog server in general.
> Instead of having a binary state (loaded vs unloaded), the table could be in
> a partially loaded state. We could also start with aggressively fetching
> certain pieces of information that we think could aid with analysis and
> lazily load the remaining pieces of metadata. Finer grained loading also
> integrates well with the LocalCatalog implementation on the coordinators
> where the the entire table need not be loaded on the Catalog server to serve
> partial meta information (e.g: show partitions <large-table>).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]