[ 
https://issues.apache.org/jira/browse/IMPALA-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163144#comment-17163144
 ] 

Tim Armstrong commented on IMPALA-4746:
---------------------------------------

IMPALA-8125 implements this for write fragments only - that allows controlling 
# of small files without affecting the rest of the query plan.

> num_nodes should take any value and use that many nodes
> -------------------------------------------------------
>
>                 Key: IMPALA-4746
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4746
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.6.0
>            Reporter: Peter Ebert
>            Priority: Major
>
> Ideally num_nodes should be expanded to allow any # of nodes (up to total 
> number of Impalad's in the cluster).  This would randomly select nodes to run 
> fragments on, prioritizing data locallity (where possible).  Locality gets 
> more complex if there are multiple scans, perhaps defaulting to the largest 
> table might work best
> This would conceivably be helpful in several scenarios
> * Scale performance testing: could run a query with num_nodes = 10 vs 20 to 
> get a rough idea of scaling performance.
> * Avoid writing small files: if doing an insert on larger clusters (200+), 
> even fairly large data sets can be spread out too thin.  Setting # of nodes 
> would allow you to control small files which then simplifies read query plans 
> and # of fragments.
> * Control cluster utilization: some queries may perform just as well with 
> less nodes, particularly where there are broadcast joins where data would be 
> sent over the network anyways.  There may be some multi-tenancy improvements 
> by restricting the number of nodes some queries can use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to