[
https://issues.apache.org/jira/browse/IMPALA-6897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461966#comment-16461966
]
Vuk Ercegovac commented on IMPALA-6897:
---------------------------------------
Several thoughts on this one:
* "too small": define in terms of work done per file (e.g., fixed overhead per
file vs. work on data)? Some ratio of this can make a file "too small"
* "too many": perhaps it depends on max parallelism per node * number of nodes
(with a multiplier)?
* Both the producer (e.g., insert-select) and consumer (e.g., select) should
get warnings but the producer is making things worse whereas the consumer's
expectations for performance are being managed.
> Catalog server should flag tables with large number of small files
> ------------------------------------------------------------------
>
> Key: IMPALA-6897
> URL: https://issues.apache.org/jira/browse/IMPALA-6897
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Affects Versions: Impala 2.13.0
> Reporter: bharath v
> Priority: Major
> Labels: ramp-up, supportability
>
> Since Catalog has all the file metadata information available, it should help
> flag tables with large number of small files. This information can be
> propagated to the coordinators and should be reflected in the query profiles
> like how we do for "missing stats".
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]