Matthew Jacobs has posted comments on this change.

Change subject: Simplify creating external Kudu tables and add DROP DATABASE 
CASCADE
......................................................................


Patch Set 3:

(38 comments)

First pass. Haven't gotten to all of the python tests yet.

http://gerrit.cloudera.org:8080/#/c/2617/3//COMMIT_MSG
Commit Message:

Line 23: KUDU as a file format
what is Hive planning to do here? Ideally we can be compatible


http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 1243:   {: RESULT = null; :}
why change this?


http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java
File fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java:

Line 123: f (comment_ != null ?
        :         !comment_.equals(columnDef.comment_) : columnDef.comment_ != 
null) return false;
these nested ternary statements are hard to read


Line 118: public boolean equals(Object o) {
        :     if (this == o) return true;
        :     if (o == null || getClass() != o.getClass()) return false;
        :     ColumnDef columnDef = (ColumnDef) o;
        :     if (!colName_.equals(columnDef.colName_)) return false;
        :     if (comment_ != null ?
        :         !comment_.equals(columnDef.comment_) : columnDef.comment_ != 
null) return false;
        :     if (typeDef_ != null ?
        :         !typeDef_.equals(columnDef.typeDef_) : columnDef.typeDef_ != 
null) return false;
        :     return !(type_ != null ? !type_.equals(columnDef.type_) : 
columnDef.type_ != null);
        :   }
would be cleaner to use org.apache.commons.lang.builder.EqualsBuilder


http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java:

Line 81:  
nit: odd spacing


Line 194: setTable_properties(tblProperties_);
could the behavior be different in cases where this wasn't set previously?


Line 224: KuduTable.KUDU_STORAGE_HANDLER.equals(
        :             tblProperties_.get(KuduTable.KEY_STORAGE_HANDLER)
we should just make sure it's OK with others if we don't support the Hive 
syntax (even though it's stupid)


Line 236: fileFormat_ != THdfsFileFormat.KUDU
can we only throw for kudu if not external?


Line 298: rowFormat_ = RowFormat.DEFAULT_ROW_FORMAT;
I'm not sure if this is OK. If reset() gets called and then re-analyze, I think 
we'll throw on l295. I'm not sure exactly what would trigger that but I had to 
be careful about this in the past.

Is there a reason we can't keep using DEFAULT_ROW_FORMAT instead of null?


Line 304: AnalysisException
do we need this as well as the exception on l229?


Line 307: tblProperties_
Same concern about re-analysis since we don't have the original state and then 
calling reset() -> analysis() could throw.

You should check with Alex/Marcel. It's possible we don't have to worry about 
reanalysis when dealing with CREATE TABLE statements today, but in general we 
avoid doing this so we should probably be consistent. I'm fine with whatever 
Alex or Marcel says here.


Line 313: location
what location is this referring to? I'm not sure what this comment is about.


Line 347:  
nit: extra space for our java style


Line 349: TODO: Investigate what happens if more than one DistributeComponent
        :           //       without any columns is specified. Maybe reject 
that here?
can you add a test case for this?


Line 362: throwIfExists
how about throwIfNotEmpty to avoid confusion with the above fn?


Line 358:   private void throwIfExists(Object o, String message) throws 
AnalysisException {
        :     if (o != null) throw new AnalysisException(message);
        :   }
        : 
        :   private void throwIfExists(Collection c, String message) throws 
AnalysisException {
        :     if (c != null && !c.isEmpty()) throw new 
AnalysisException(message);
        :   }
        : 
        :   private void throwIfNotExists(Object o, String message) throws 
AnalysisException {
        :     if (o == null) throw new AnalysisException(message);
        :   }
can you move these to static methods on a utility class, maybe AnalysisUtil?


http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 153: private
brief comment


http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/catalog/delegates/DdlDelegate.java
File fe/src/main/java/com/cloudera/impala/catalog/delegates/DdlDelegate.java:

Line 64: in not
extra word


Line 63: If the table does not exist
       :    * or in not managed an exception will not be thrown
Seems like an implementation detail given this is an abstract method, and it 
kind of rehashes the first sentence. But if you wanna keep this, it's easier to 
parse if you flip the condition, e.g. "A no-op for non-existent or non-managed 
tables."


http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java
File 
fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java:

Line 51: REQUIRED_COLUMN_DEF
comment that this is to satisfy thrift?


http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java:

Line 1129: For
         :     // now Impala will assume that any table not in its cache also 
doesn't require the
         :     // use of a DdlDelegate.
So it sounds like there are 3(?) concerning cases:
1) db is unknown to impala (i.e. null here) but it has a kudu table in it
2) (related to 1) impala has outdated db and there is actually a kudu table not 
in db.getTables()
3) user created a kudu table in impala but drops it in hive

for 1-2, what is the result? the database is not actually dropped and the 
tables we didn't know about remain?

for 3, does that mean the metastore info is dropped but the table continues to 
live in kudu?


Line 1146:         
4 spaces


Line 1533: distributeBy
nit: add this to the header comment too


Line 1537: populating columns and such
By mutating newTable directly? If so, can you say that.


http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/util/KuduClient.java
File fe/src/main/java/com/cloudera/impala/util/KuduClient.java:

Line 34: public class KuduClient implements AutoCloseable{
nit space


http://gerrit.cloudera.org:8080/#/c/2617/3/infra/python/bootstrap_virtualenv.py
File infra/python/bootstrap_virtualenv.py:

Line 80:   
nit extra space


Line 127: ensure_kudu_installed_if_possible
ensure_X_if_possible is funny. How about try_install_kudu?


Line 129: If the toolchain isn't in use or hasn't been populated
        :      yet, nothing will be done.
In most cases (at least a fresh env calling buildall.sh) the toolchain should 
have been populated first, right? (i.e. there's no circular dependency)


Line 131: able
supported?


Line 176: s
extra s


http://gerrit.cloudera.org:8080/#/c/2617/3/infra/python/deps/requirements.txt
File infra/python/deps/requirements.txt:

Line 56: Cython == 0.23.4
does the order matter? if not, can you put this special kudu stuff at the 
bottom? Or at least add a space between 55 and 56


http://gerrit.cloudera.org:8080/#/c/2617/3/testdata/workloads/functional-query/queries/QueryTest/kudu-show-create.test
File 
testdata/workloads/functional-query/queries/QueryTest/kudu-show-create.test:

Line 9: )
It's not ideal that this create table result statement will not be executable.


Line 11: ====
is this just whitespace?


http://gerrit.cloudera.org:8080/#/c/2617/3/tests/common/kudu_test_suite.py
File tests/common/kudu_test_suite.py:

Line 53: if not cls.__DB_NAME:
       :       cls.__DB_NAME = \
       :           choice(ascii_lowercase) + "".join(sample(ascii_lowercase + 
digits, 5))
       :     return cls.__DB_NAME
potential race if tests are run in parallel? why does the db_name need to be a 
class member generated lazily? can you just create it in setup_class(cls)?


Line 59:   def random_table_name(cls):
get_


Line 78:          assert not kudu.table_exists(kudu_table.name)
this line needs to be in the with block


Line 103: drop_impala_table_after_context
comment please, not immediately clear how to use this


http://gerrit.cloudera.org:8080/#/c/2617/3/tests/conftest.py
File tests/conftest.py:

Line 264: conn
Is all this stuff needed in this review? I wasn't sure what's going on with a 
lot of the stuff here and it could benefit from being in a separate review to 
break things up a bit and where someone with more python experience can review 
this better.


-- 
To view, visit http://gerrit.cloudera.org:8080/2617
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic141102818b6dad3016181b179a14024d0ff709d
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Casey Ching <[email protected]>
Gerrit-Reviewer: Casey Ching <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-HasComments: Yes

Reply via email to