Tim Armstrong created IMPALA-7400:
-------------------------------------
Summary: "SQL Statements to Remove or Adapt" is out of date
Key: IMPALA-7400
URL: https://issues.apache.org/jira/browse/IMPALA-7400
Project: IMPALA
Issue Type: Bug
Components: Docs
Affects Versions: Impala 3.0
Reporter: Tim Armstrong
Assignee: Alex Rodoni
"Impala has no DELETE statement." and "Impala has no UPDATE statement. " are
not totally true - Impala has those statements but only for Kudu tables.
"For example, Impala does not support natural joins or anti-joins," - Impala
does support Anti-joins via NOT IN/NOT EXISTS or even explicitly like:
{code}
select * from functional.alltypes a1 left anti join functional.alltypestiny a2
on a1.id = a2.id;
{code}
"Within queries, Impala requires query aliases for any subqueries:" - this is
only true for subqueries used as inline views in the FROM clause. E.g. the
following works:
{code}
select * from functional.alltypes where id = (select min(id) from
functional.alltypes);
{code}
" Impala .. requires the CROSS JOIN operator for Cartesian products." - untrue,
this works:
{code}
select * from functional.alltypes t1, functional.alltypes t2;
{code}
"Have you run the COMPUTE STATS statement on each table involved in join
queries". This isn't specific to queries with joins, although may have more
impact. We recommend that users run COMPUTE STATS on all tables.
"A CREATE TABLE statement with no PARTITIONED BY clause stores all the data
files in the same physical location," - unpartitioned tables with multiple
files can have files residing in different locations (and there are already 3
replicas per file by default, so the statement is a little misleading even if
there's a single file). I think the latest statement about "Have you
partitioned at the right granularity so that there is enough data in each
partition to parallelize the work for each query?" is also misleading for the
same reason.
"The INSERT ... VALUES syntax is suitable for setting up toy tables with a few
rows for functional testing, but because each such statement creates a separate
tiny file in HDFS". This advice only applies to HDFS, this should work fine for
Kudu tables although the INSERT statements are not particularly fast.
"The number of expressions allowed in an Impala query might be smaller than for
some other database systems, causing failures for very complicated queries" -
this doesn't seem right - I don't know why the queries would fail. Also the
codegen time isn't really specific to expressions or where clauses. There seems
to be a point buried in there, but maybe it's just essentially that "Complex
queries may have high codegen time"
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]