[jira] [Commented] (IMPALA-13154) Some tables are missing in Top-N Tables with Highest Memory Requirements

Quanlong Huang (Jira) Sat, 19 Oct 2024 16:59:00 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-13154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17891212#comment-17891212
 ]


Quanlong Huang commented on IMPALA-13154:
-----------------------------------------

We don't support partially loaded tables in catalogd yet (IMPALA-8937). 
Currently, if the table is loaded, it can only be fully loaded, which means all 
the partitions are loaded.

So at the end of 
[HdfsTable.load()|https://github.com/apache/impala/blob/d2cb00cecea1883955a5bc4be997567bb04be0f8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1353],
 we can always access the full metadata, no matter whether it's doing an 
incremental reload. I think we can do the same for IcebergTable.load() and it's 
ok to use a separated JIRA.
{quote}Could we still set the metrics in HdfsTable.getTHdfsTable(), but change 
the code to calculate estimatedMetadataSize for type' not equal to 
ThriftObjectType.FULL?
{quote}
This seems a simpler solution. Although HdfsTable.getTHdfsTable() is not used 
in processing getPartialCatalogRequests which are used by LocalCatalog mode 
coordinators to fetch metadata from catalogd, it's used in the code path of 
collecting catalog updates (from CatalogServiceCatalog.getCatalogDelta() and 
the 'type' is DESCRIPTOR_ONLY in LocalCatalog mode). So I think it works.
{quote}Do we need to re-calculate fileMetadataStats_ in HdfsTable.load()?
{quote}
Yeah, at least it's used in calculating 'memUsageEstimate': 
[https://github.com/apache/impala/blob/d2cb00cecea1883955a5bc4be997567bb04be0f8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2502-L2503]

> Some tables are missing in Top-N Tables with Highest Memory Requirements
> ------------------------------------------------------------------------
>
>                 Key: IMPALA-13154
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13154
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Priority: Major
>              Labels: catalog-2024
>
> In the /catalog page of catalogd WebUI, there is a table for "Top-N Tables 
> with Highest Memory Requirements". However, not all tables are counted there. 
> E.g. after starting catalogd, run a DESCRIBE on a table to trigger metadata 
> loading on it. When it's done, the table is not shown in the WebUI.
> The cause is that the list is only updated in HdfsTable.getTHdfsTable() when 
> 'type' is 
> ThriftObjectType.FULL:
> [https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2457-L2459]
> This used to be the place that all code paths using the table will go to. 
> However, we've done bunch of optimizations to void getting the FULL thrift 
> object of the table, especially in LocalCatalog mode. We should move the code 
> of updating the list of largest tables somewhere that all table usages can 
> reach, e.g. after loading the metadata of the table, we can update its 
> estimatedMetadataSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-13154) Some tables are missing in Top-N Tables with Highest Memory Requirements

Reply via email to