[ 
https://issues.apache.org/jira/browse/IMPALA-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha closed IMPALA-1073.
------------------------------
    Resolution: Fixed

Clustering hint was added for insert and is enabled by default for HDFS tables:

{noformat}
/* +CLUSTERED */ and /* +NOCLUSTERED */ Hints
/* +CLUSTERED */ sorts data by the partition columns before inserting to ensure 
that only
 one partition is written at a time per node. Use this hint to reduce the 
number of files kept
 open and the number of buffers kept in memory simultaneously. This technique 
is primarily 
useful for inserts into Parquet tables, where the large block size requires 
substantial memory
 to buffer data for multiple output files at once. This hint is available in 
CDH 5.10 / Impala 2.8
 or higher.
Starting in CDH 6.0 / Impala 3.0, /* +CLUSTERED */ is the default behavior for 
HDFS tables.
{noformat}

As such, I am marking this fixed.

> Consider extending the hints for insert
> ---------------------------------------
>
>                 Key: IMPALA-1073
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1073
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Frontend
>    Affects Versions: Impala 1.3.1
>            Reporter: Nong Li
>            Priority: Minor
>
> We currently have the SHUFFLE/NOSHUFFLE hints for insert but it seems to be 
> insufficient and still leading to some pain. We should consider adding the 
> RANDOM and UNPARTITIONED strategies as well to have better control of the 
> resulting number of files/file sizes. Whether these are added as hints, query 
> options or improvements to the plan is open.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to