Tim Armstrong created IMPALA-7400:
-------------------------------------

             Summary: "SQL Statements to Remove or Adapt" is out of date
                 Key: IMPALA-7400
                 URL: https://issues.apache.org/jira/browse/IMPALA-7400
             Project: IMPALA
          Issue Type: Bug
          Components: Docs
    Affects Versions: Impala 3.0
            Reporter: Tim Armstrong
            Assignee: Alex Rodoni


"Impala has no DELETE statement." and "Impala has no UPDATE statement. " are 
not totally true - Impala has those statements but only for Kudu tables.

"For example, Impala does not support natural joins or anti-joins," - Impala 
does support Anti-joins via NOT IN/NOT EXISTS or even explicitly like:
{code}
select * from functional.alltypes a1 left anti join functional.alltypestiny a2 
on a1.id = a2.id;
{code}

"Within queries, Impala requires query aliases for any subqueries:" - this is 
only true for subqueries used as inline views in the FROM clause. E.g. the 
following works:
{code}
select * from functional.alltypes where id = (select min(id) from 
functional.alltypes);
{code}

" Impala .. requires the CROSS JOIN operator for Cartesian products." - untrue, 
this works:
{code}
select * from functional.alltypes t1, functional.alltypes t2;
{code}


"Have you run the COMPUTE STATS statement on each table involved in join 
queries". This isn't specific to queries with joins, although may have more 
impact. We recommend that users run COMPUTE STATS on all tables.

"A CREATE TABLE statement with no PARTITIONED BY clause stores all the data 
files in the same physical location," - unpartitioned tables with multiple 
files can have files residing in different locations (and there are already 3 
replicas per file by default, so the statement is a little misleading even if 
there's a single file). I think the latest statement about "Have you 
partitioned at the right granularity so that there is enough data in each 
partition to parallelize the work for each query?" is also misleading for the 
same reason.

"The INSERT ... VALUES syntax is suitable for setting up toy tables with a few 
rows for functional testing, but because each such statement creates a separate 
tiny file in HDFS". This advice only applies to HDFS, this should work fine for 
Kudu tables although the INSERT statements are not particularly fast.

"The number of expressions allowed in an Impala query might be smaller than for 
some other database systems, causing failures for very complicated queries" - 
this doesn't seem right - I don't know why the queries would fail. Also the 
codegen time isn't really specific to expressions or where clauses. There seems 
to be a point buried in there, but maybe it's just essentially that "Complex 
queries may have high codegen time"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to