[ 
https://issues.apache.org/jira/browse/IMPALA-6897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16686946#comment-16686946
 ] 

bharath v commented on IMPALA-6897:
-----------------------------------

Fair point. I propose that we add "Top-n tables with the most number of files" 
under <catalogd-host>:25020/catalog like we already do "Tables with Highest 
Memory Requirements" and "Tables with Highest Number of Metadata Operations" 
unless someone has any other alternate suggestions.

Also, FWIW, we already expose file metrics for loaded tables at 
<catalogd-host>:25020/table_metrics?name=<db>.<tbl>. So, given a loaded table 
name, we should be able to find it's file count easily.

> Catalog server should flag tables with large number of small files
> ------------------------------------------------------------------
>
>                 Key: IMPALA-6897
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6897
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 2.13.0
>            Reporter: bharath v
>            Priority: Major
>              Labels: ramp-up, supportability
>
> Since Catalog has all the file metadata information available, it should help 
> flag tables with large number of small files. This information can be 
> propagated to the coordinators and should be reflected in the query profiles 
> like how we do for "missing stats".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to