Thomas Tauber-Marshall created IMPALA-7171:
----------------------------------------------
Summary: Add docs for Kudu insert partitioning/sorting
Key: IMPALA-7171
URL: https://issues.apache.org/jira/browse/IMPALA-7171
Project: IMPALA
Issue Type: Improvement
Reporter: Thomas Tauber-Marshall
On the page:
http://impala.apache.org/docs/build3x/html/topics/impala_kudu.html, at the end
of the section: "Impala DML Support for Kudu Tables (INSERT, UPDATE, DELETE,
UPSERT)", we should add text like:
Starting from Impala 2.9, Impala will automatically add a partition and sort
step to INSERTs before sending the rows to Kudu. Since Kudu partitions and
sorts rows on write, pre-partitioning and sorting takes some of the load off of
Kudu, and helps ensure that large INSERTs complete without timing out, but it
may slow down the end-to-end performance of the INSERT. Starting from Impala
2.10, the hints "/* +noshuffle,noclustered */" may be used to turn this
pre-partitioning and sorting off. Additionally, since sorting may consume a lot
of memory, users should consider setting a "mem_limit" for these queries.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)