Matthew Jacobs has posted comments on this change. Change subject: Simplify creating external Kudu tables and add DROP DATABASE CASCADE ......................................................................
Patch Set 3: (38 comments) First pass. Haven't gotten to all of the python tests yet. http://gerrit.cloudera.org:8080/#/c/2617/3//COMMIT_MSG Commit Message: Line 23: KUDU as a file format what is Hive planning to do here? Ideally we can be compatible http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 1243: {: RESULT = null; :} why change this? http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java File fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java: Line 123: f (comment_ != null ? : !comment_.equals(columnDef.comment_) : columnDef.comment_ != null) return false; these nested ternary statements are hard to read Line 118: public boolean equals(Object o) { : if (this == o) return true; : if (o == null || getClass() != o.getClass()) return false; : ColumnDef columnDef = (ColumnDef) o; : if (!colName_.equals(columnDef.colName_)) return false; : if (comment_ != null ? : !comment_.equals(columnDef.comment_) : columnDef.comment_ != null) return false; : if (typeDef_ != null ? : !typeDef_.equals(columnDef.typeDef_) : columnDef.typeDef_ != null) return false; : return !(type_ != null ? !type_.equals(columnDef.type_) : columnDef.type_ != null); : } would be cleaner to use org.apache.commons.lang.builder.EqualsBuilder http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java: Line 81: nit: odd spacing Line 194: setTable_properties(tblProperties_); could the behavior be different in cases where this wasn't set previously? Line 224: KuduTable.KUDU_STORAGE_HANDLER.equals( : tblProperties_.get(KuduTable.KEY_STORAGE_HANDLER) we should just make sure it's OK with others if we don't support the Hive syntax (even though it's stupid) Line 236: fileFormat_ != THdfsFileFormat.KUDU can we only throw for kudu if not external? Line 298: rowFormat_ = RowFormat.DEFAULT_ROW_FORMAT; I'm not sure if this is OK. If reset() gets called and then re-analyze, I think we'll throw on l295. I'm not sure exactly what would trigger that but I had to be careful about this in the past. Is there a reason we can't keep using DEFAULT_ROW_FORMAT instead of null? Line 304: AnalysisException do we need this as well as the exception on l229? Line 307: tblProperties_ Same concern about re-analysis since we don't have the original state and then calling reset() -> analysis() could throw. You should check with Alex/Marcel. It's possible we don't have to worry about reanalysis when dealing with CREATE TABLE statements today, but in general we avoid doing this so we should probably be consistent. I'm fine with whatever Alex or Marcel says here. Line 313: location what location is this referring to? I'm not sure what this comment is about. Line 347: nit: extra space for our java style Line 349: TODO: Investigate what happens if more than one DistributeComponent : // without any columns is specified. Maybe reject that here? can you add a test case for this? Line 362: throwIfExists how about throwIfNotEmpty to avoid confusion with the above fn? Line 358: private void throwIfExists(Object o, String message) throws AnalysisException { : if (o != null) throw new AnalysisException(message); : } : : private void throwIfExists(Collection c, String message) throws AnalysisException { : if (c != null && !c.isEmpty()) throw new AnalysisException(message); : } : : private void throwIfNotExists(Object o, String message) throws AnalysisException { : if (o == null) throw new AnalysisException(message); : } can you move these to static methods on a utility class, maybe AnalysisUtil? http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java: Line 153: private brief comment http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/catalog/delegates/DdlDelegate.java File fe/src/main/java/com/cloudera/impala/catalog/delegates/DdlDelegate.java: Line 64: in not extra word Line 63: If the table does not exist : * or in not managed an exception will not be thrown Seems like an implementation detail given this is an abstract method, and it kind of rehashes the first sentence. But if you wanna keep this, it's easier to parse if you flip the condition, e.g. "A no-op for non-existent or non-managed tables." http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java File fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java: Line 51: REQUIRED_COLUMN_DEF comment that this is to satisfy thrift? http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java: Line 1129: For : // now Impala will assume that any table not in its cache also doesn't require the : // use of a DdlDelegate. So it sounds like there are 3(?) concerning cases: 1) db is unknown to impala (i.e. null here) but it has a kudu table in it 2) (related to 1) impala has outdated db and there is actually a kudu table not in db.getTables() 3) user created a kudu table in impala but drops it in hive for 1-2, what is the result? the database is not actually dropped and the tables we didn't know about remain? for 3, does that mean the metastore info is dropped but the table continues to live in kudu? Line 1146: 4 spaces Line 1533: distributeBy nit: add this to the header comment too Line 1537: populating columns and such By mutating newTable directly? If so, can you say that. http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/util/KuduClient.java File fe/src/main/java/com/cloudera/impala/util/KuduClient.java: Line 34: public class KuduClient implements AutoCloseable{ nit space http://gerrit.cloudera.org:8080/#/c/2617/3/infra/python/bootstrap_virtualenv.py File infra/python/bootstrap_virtualenv.py: Line 80: nit extra space Line 127: ensure_kudu_installed_if_possible ensure_X_if_possible is funny. How about try_install_kudu? Line 129: If the toolchain isn't in use or hasn't been populated : yet, nothing will be done. In most cases (at least a fresh env calling buildall.sh) the toolchain should have been populated first, right? (i.e. there's no circular dependency) Line 131: able supported? Line 176: s extra s http://gerrit.cloudera.org:8080/#/c/2617/3/infra/python/deps/requirements.txt File infra/python/deps/requirements.txt: Line 56: Cython == 0.23.4 does the order matter? if not, can you put this special kudu stuff at the bottom? Or at least add a space between 55 and 56 http://gerrit.cloudera.org:8080/#/c/2617/3/testdata/workloads/functional-query/queries/QueryTest/kudu-show-create.test File testdata/workloads/functional-query/queries/QueryTest/kudu-show-create.test: Line 9: ) It's not ideal that this create table result statement will not be executable. Line 11: ==== is this just whitespace? http://gerrit.cloudera.org:8080/#/c/2617/3/tests/common/kudu_test_suite.py File tests/common/kudu_test_suite.py: Line 53: if not cls.__DB_NAME: : cls.__DB_NAME = \ : choice(ascii_lowercase) + "".join(sample(ascii_lowercase + digits, 5)) : return cls.__DB_NAME potential race if tests are run in parallel? why does the db_name need to be a class member generated lazily? can you just create it in setup_class(cls)? Line 59: def random_table_name(cls): get_ Line 78: assert not kudu.table_exists(kudu_table.name) this line needs to be in the with block Line 103: drop_impala_table_after_context comment please, not immediately clear how to use this http://gerrit.cloudera.org:8080/#/c/2617/3/tests/conftest.py File tests/conftest.py: Line 264: conn Is all this stuff needed in this review? I wasn't sure what's going on with a lot of the stuff here and it could benefit from being in a separate review to break things up a bit and where someone with more python experience can review this better. -- To view, visit http://gerrit.cloudera.org:8080/2617 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic141102818b6dad3016181b179a14024d0ff709d Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Casey Ching <[email protected]> Gerrit-Reviewer: Casey Ching <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-HasComments: Yes
