[ 
https://issues.apache.org/jira/browse/IMPALA-6897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461966#comment-16461966
 ] 

Vuk Ercegovac commented on IMPALA-6897:
---------------------------------------

Several thoughts on this one:
 * "too small": define in terms of work done per file (e.g., fixed overhead per 
file vs. work on data)? Some ratio of this can make a file "too small"
 * "too many": perhaps it depends on max parallelism per node * number of nodes 
(with a multiplier)?
 * Both the producer (e.g., insert-select) and consumer (e.g., select) should 
get warnings but the producer is making things worse whereas the consumer's 
expectations for performance are being managed.

> Catalog server should flag tables with large number of small files
> ------------------------------------------------------------------
>
>                 Key: IMPALA-6897
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6897
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 2.13.0
>            Reporter: bharath v
>            Priority: Major
>              Labels: ramp-up, supportability
>
> Since Catalog has all the file metadata information available, it should help 
> flag tables with large number of small files. This information can be 
> propagated to the coordinators and should be reflected in the query profiles 
> like how we do for "missing stats".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to