Casey Ching has posted comments on this change. Change subject: Simplify creating external Kudu tables and add DROP DATABASE CASCADE ......................................................................
Patch Set 3: (38 comments) http://gerrit.cloudera.org:8080/#/c/2617/3//COMMIT_MSG Commit Message: Line 23: KUDU as a file format > what is Hive planning to do here? Ideally we can be compatible Kudu isn't supported at all in Hive yet. As far as I know there's no way to use it. Someone (from Cloudera) is working on it though. https://issues.apache.org/jira/browse/HIVE-12971 is for this. We need to coordinate with them eventually. http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 1243: {: RESULT = null; :} > why change this? Row format is specific to text files. So it should never be used with non-text files. As it was, there was no way to differentiate between a user specifying the default row format or not specifying the row format. Now if the value is null, that indicates no row format was given. http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java File fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java: Line 123: f (comment_ != null ? : !comment_.equals(columnDef.comment_) : columnDef.comment_ != null) return false; > these nested ternary statements are hard to read Heh, this is all generated code so I didn't really look at it much. Line 118: public boolean equals(Object o) { : if (this == o) return true; : if (o == null || getClass() != o.getClass()) return false; : ColumnDef columnDef = (ColumnDef) o; : if (!colName_.equals(columnDef.colName_)) return false; : if (comment_ != null ? : !comment_.equals(columnDef.comment_) : columnDef.comment_ != null) return false; : if (typeDef_ != null ? : !typeDef_.equals(columnDef.typeDef_) : columnDef.typeDef_ != null) return false; : return !(type_ != null ? !type_.equals(columnDef.type_) : columnDef.type_ != null); : } > would be cleaner to use org.apache.commons.lang.builder.EqualsBuilder Done. Btw if you use a common IDE there is probably a plugin to generate using that builder https://plugins.jetbrains.com/plugin/7283?pr= http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java: Line 81: > nit: odd spacing Changed to do what the java people seem to do. I looked at some of their code. Line 194: setTable_properties(tblProperties_); > could the behavior be different in cases where this wasn't set previously? I guess it's possible but there doesn't seem to be any difference. I expect the tests would fail if there was a problem. Line 224: KuduTable.KUDU_STORAGE_HANDLER.equals( : tblProperties_.get(KuduTable.KEY_STORAGE_HANDLER) > we should just make sure it's OK with others if we don't support the Hive s I think this is closer to the standard syntax. Line 236: fileFormat_ != THdfsFileFormat.KUDU > can we only throw for kudu if not external? That's possible but keeping the Kudu logic together as much as possible seems better. For example maybe there are some cases where avro tables require columns too, it might get messy having all the details here. Line 298: rowFormat_ = RowFormat.DEFAULT_ROW_FORMAT; > I'm not sure if this is OK. If reset() gets called and then re-analyze, I t Ah I hadn't considered reset(). Keeping null for the row format seems better since it's really just a text table thing. The row format is actually optional in TCreateTableParams. Maybe that will work. Are you ok with that? Line 304: AnalysisException > do we need this as well as the exception on l229? They are both needed but could be moved closer together. I'm fine either way. Line 307: tblProperties_ > Same concern about re-analysis since we don't have the original state and t Hmm, ya maybe I'll keep the properties set during analysis in a separate place that can be cleared during reset(). Line 313: location > what location is this referring to? I'm not sure what this comment is about Updated the TODO. Let me know if it still doesn't make sense. Line 347: > nit: extra space for our java style ridiculous Line 349: TODO: Investigate what happens if more than one DistributeComponent : // without any columns is specified. Maybe reject that here? > can you add a test case for this? It's a TODO? Line 362: throwIfExists > how about throwIfNotEmpty to avoid confusion with the above fn? Changed to throwIfNotNullOrNotEmpty() Line 358: private void throwIfExists(Object o, String message) throws AnalysisException { : if (o != null) throw new AnalysisException(message); : } : : private void throwIfExists(Collection c, String message) throws AnalysisException { : if (c != null && !c.isEmpty()) throw new AnalysisException(message); : } : : private void throwIfNotExists(Object o, String message) throws AnalysisException { : if (o == null) throw new AnalysisException(message); : } > can you move these to static methods on a utility class, maybe AnalysisUtil Done http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java: Line 153: private > brief comment Done http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/catalog/delegates/DdlDelegate.java File fe/src/main/java/com/cloudera/impala/catalog/delegates/DdlDelegate.java: Line 64: in not > extra word Done Line 63: If the table does not exist : * or in not managed an exception will not be thrown > Seems like an implementation detail given this is an abstract method, and i I left it as is. The sentences list the information in order of importance. Maybe it's not clear and I can reword this, but subclasses must adhere to this description. Ex, they must not throw if the table doesn't exist. http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java File fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java: Line 51: REQUIRED_COLUMN_DEF > comment that this is to satisfy thrift? Renamed it to avoid the comment. http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java: Line 1129: For : // now Impala will assume that any table not in its cache also doesn't require the : // use of a DdlDelegate. > So it sounds like there are 3(?) concerning cases: Yes the Kudu tables would remain but the database metadata in Hive would be gone. Added to the comment. Line 1146: > 4 spaces Done Line 1533: distributeBy > nit: add this to the header comment too Done. I didn't put much effort into this since it doesn't really make sense. I'll clean it up later. Line 1537: populating columns and such > By mutating newTable directly? If so, can you say that. Done http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/util/KuduClient.java File fe/src/main/java/com/cloudera/impala/util/KuduClient.java: Line 34: public class KuduClient implements AutoCloseable{ > nit space Done http://gerrit.cloudera.org:8080/#/c/2617/3/infra/python/bootstrap_virtualenv.py File infra/python/bootstrap_virtualenv.py: Line 80: > nit extra space Done Line 127: ensure_kudu_installed_if_possible > ensure_X_if_possible is funny. How about try_install_kudu? renamed Line 129: If the toolchain isn't in use or hasn't been populated : yet, nothing will be done. > In most cases (at least a fresh env calling buildall.sh) the toolchain shou I'm not sure why you bring up "most cases" since it needs to work in all cases. The toolchain bootstrap depends on the virtualenv so this really shouldn't run the first time around. Line 131: able > supported? Shoould have been "available" Line 176: s > extra s Done http://gerrit.cloudera.org:8080/#/c/2617/3/infra/python/deps/requirements.txt File infra/python/deps/requirements.txt: Line 56: Cython == 0.23.4 > does the order matter? if not, can you put this special kudu stuff at the b Oops I meant to indent cython/numpy. They are below to show the dependency. http://gerrit.cloudera.org:8080/#/c/2617/3/testdata/workloads/functional-query/queries/QueryTest/kudu-show-create.test File testdata/workloads/functional-query/queries/QueryTest/kudu-show-create.test: Line 9: ) > It's not ideal that this create table result statement will not be executab Ya the create managed table part will be done later. It's just in a weird state now. Line 11: ==== > is this just whitespace? I'm not sure what going on here. The file seems fine locally. No extra line or spaces. http://gerrit.cloudera.org:8080/#/c/2617/3/tests/common/kudu_test_suite.py File tests/common/kudu_test_suite.py: Line 53: if not cls.__DB_NAME: : cls.__DB_NAME = \ : choice(ascii_lowercase) + "".join(sample(ascii_lowercase + digits, 5)) : return cls.__DB_NAME > potential race if tests are run in parallel? why does the db_name need to b Yes there is a race. I need to test this more. This needs to be its own thing because the method has an external caller. Line 59: def random_table_name(cls): > get_ Left as is. If anything I'd change it to generate_. I think it reads fine though. Line 78: assert not kudu.table_exists(kudu_table.name) > this line needs to be in the with block It's correct as-is. The Kudu table won't exist outside the block and that's what this line is showing. Line 103: drop_impala_table_after_context > comment please, not immediately clear how to use this Done http://gerrit.cloudera.org:8080/#/c/2617/3/tests/conftest.py File tests/conftest.py: Line 264: conn > Is all this stuff needed in this review? I wasn't sure what's going on with The tests use this stuff. It seems pretty small compared to the other stuff. Anyhow I can check in the test infra first if you like. I'm not sure if there would be tests to go along with the infra though. The tests would be in the following commit. -- To view, visit http://gerrit.cloudera.org:8080/2617 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic141102818b6dad3016181b179a14024d0ff709d Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Casey Ching <[email protected]> Gerrit-Reviewer: Casey Ching <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-HasComments: Yes
