[ 
https://issues.apache.org/jira/browse/IMPALA-6897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16674572#comment-16674572
 ] 

bharath v commented on IMPALA-6897:
-----------------------------------

[~tarmstrong] You mean aggregate across all scan nodes? We do have something 
per scan node
{noformat}
00:SCAN HDFS [default.customers, RANDOM]
   partitions=1/1 files=1 size=15.44KB <===
   stored statistics:
     table: rows=0 size=15.44KB
     columns: all
   extrapolated-rows=disabled max-scan-range-rows=0
   mem-estimate=1.00MB mem-reservation=16.00KB thread-reservation=1
   tuple-ids=0 row-size=8B cardinality=0
   in pipelines: 00(GETNEXT)
{noformat}
Just to be clear, my original intention when I first created this jira was to 
be able to find top 'n' tables (across all dbs) with the most number of files.

> Catalog server should flag tables with large number of small files
> ------------------------------------------------------------------
>
>                 Key: IMPALA-6897
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6897
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 2.13.0
>            Reporter: bharath v
>            Priority: Major
>              Labels: ramp-up, supportability
>
> Since Catalog has all the file metadata information available, it should help 
> flag tables with large number of small files. This information can be 
> propagated to the coordinators and should be reflected in the query profiles 
> like how we do for "missing stats".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to