[
https://issues.apache.org/jira/browse/IMPALA-6897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16674572#comment-16674572
]
bharath v commented on IMPALA-6897:
-----------------------------------
[~tarmstrong] You mean aggregate across all scan nodes? We do have something
per scan node
{noformat}
00:SCAN HDFS [default.customers, RANDOM]
partitions=1/1 files=1 size=15.44KB <===
stored statistics:
table: rows=0 size=15.44KB
columns: all
extrapolated-rows=disabled max-scan-range-rows=0
mem-estimate=1.00MB mem-reservation=16.00KB thread-reservation=1
tuple-ids=0 row-size=8B cardinality=0
in pipelines: 00(GETNEXT)
{noformat}
Just to be clear, my original intention when I first created this jira was to
be able to find top 'n' tables (across all dbs) with the most number of files.
> Catalog server should flag tables with large number of small files
> ------------------------------------------------------------------
>
> Key: IMPALA-6897
> URL: https://issues.apache.org/jira/browse/IMPALA-6897
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Affects Versions: Impala 2.13.0
> Reporter: bharath v
> Priority: Major
> Labels: ramp-up, supportability
>
> Since Catalog has all the file metadata information available, it should help
> flag tables with large number of small files. This information can be
> propagated to the coordinators and should be reflected in the query profiles
> like how we do for "missing stats".
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]