[
https://issues.apache.org/jira/browse/IMPALA-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong updated IMPALA-8081:
----------------------------------
Description: Currently we maximise parallelism given the number of input
splits available. This is often a good decision, unless there are very many
small input splits, particularly small files. We could avoid this pathological
behaviour by having a minimum threshold of input bytes per instance (this is
still pretty crude, since file input bytes only correlates loosely with the
amount of work required). (was: The Degree of Parallelism should be
appropriate for the operation being performed. It looks like it is currently
either serial, or use as many resources as possible. Sometimes over
parallelizing can result in bad performance.)
> Automatically choose mt_dop
> ---------------------------
>
> Key: IMPALA-8081
> URL: https://issues.apache.org/jira/browse/IMPALA-8081
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: Janaki Lahorani
> Priority: Major
> Labels: multithreading
>
> Currently we maximise parallelism given the number of input splits available.
> This is often a good decision, unless there are very many small input splits,
> particularly small files. We could avoid this pathological behaviour by
> having a minimum threshold of input bytes per instance (this is still pretty
> crude, since file input bytes only correlates loosely with the amount of work
> required).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]