[ 
https://issues.apache.org/jira/browse/IMPALA-8937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8937:
----------------------------------
    Priority: Major  (was: Critical)

> Fine grained table metadata loading on Catalog server
> -----------------------------------------------------
>
>                 Key: IMPALA-8937
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8937
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog, Frontend
>    Affects Versions: Impala 2.12.0, Impala 3.3.0
>            Reporter: Bharath Vissapragada
>            Priority: Major
>
> *Background*:
> Currently the table _on the Catalog server_ is either in a loaded or unloaded 
> state (IncompleteTable). When Catalog server starts for the first time, we 
> first fetch a list of table names for each databases and every table in this 
> list starts as an unloaded table. The table lists are propagated to the 
> coordinators so that they know whether a table with a given name exists or 
> not and they can start analyzing the queries. No metadata is loaded in the 
> incomplete tables (like schema/ownership, comments etc.)
> The table metadata is loaded lazily (and the table moves into a loaded state) 
> when it is referenced in any query. When a load request comes in, all the 
> table metadata is loaded including file block information. 
> *Problem:* 
> Coordinators need some additional information when analyzing unloaded tables. 
> For example: IMPALA-8228. The ownership information is a part of the HMS 
> table schema which is not loaded until the table is marked fully loaded. 
> While this is not a problem for regular queries (like select * from <tbl>), 
> it is an issue with queries like "show tables" which do not trigger a table 
> load. In this particular case, due to the lack of ownership information, the 
> output of the table listing could be different depending on whether the table 
> is loaded. Another example is IMPALA-8606 where the GET_TABLES request does 
> not return the table comments because they are not available for unloaded 
> tables.
> *Ask:*
> We need to consider finer grained loading on the Catalog server in general. 
> Instead of having a binary state (loaded vs unloaded), the table could be in 
> a partially loaded state. We could also start with aggressively fetching 
> certain pieces of information that we think could aid with analysis and 
> lazily load the remaining pieces of metadata. Finer grained loading also 
> integrates well with the LocalCatalog implementation on the coordinators 
> where the the entire table need not be loaded on the Catalog server to serve 
> partial meta information (e.g: show partitions <large-table>).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to