Casey Ching has posted comments on this change. Change subject: IMPALA-2848: Simplify creation of a new Kudu table ......................................................................
Patch Set 4: (20 comments) http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java File fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java: Line 51: private final boolean isPrimaryKey_; > You may want to add a comment. This is specific to kudu tables only. Done Line 178: mapByColumnNames( > maybe "getNameToColumnDefinitionMap()" We probably have too many "get..." methods already. Ideally "get" would retrieve something. I'm open to changing though, what about it don't you like? Maybe "toColumnNameMap()" or "convert..."? http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java: Line 195: : > I thought this was only for Kudu tables. It might be, it's redundant with the line below so I just removed it. (The IDE complains about it.) http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java: Line 104: those are not included here > Out of curiosity, why is that the case? Updated the comment, it's just a different data structure. The column level pk info is stored in the ColumnDef class. I'm not sure if that's what you are asking about though. Line 179: throwIfPrimaryKeysWereSpecified > Can't you use the generic throwIfNotNullOrNotEmpty here as well? Well the ColumnDefs need to be checked too. So more work is needed (iterate over all of them). Line 211: // TODO: Find out what is creating a directory in HDFS and stop doing that. Kudu : // tables shouldn't have HDFS dirs. > Is this still relevant? Maybe create a JIRA and reference it here? Done Line 221: throwIfPrimaryKeysWereSpecified( > Same comment as in L179 Same Line 233: else { > I would put the entire else block in a separate function. It's best to explain why, this statement isn't really compelling :) IMO it's an even trade. This way the logic is altogether. As a reader who doesn't care about Kudu, there's only one function to skip over. If you do care, now you may need to skip around in the code. If you want to modify the separate function, you need to find all the callers and check that your modifications integrate well with the callers. It's not a big deal, I'll change it if you really want but I don't see how it's an improvement. Line 265: if (!getDistributeParams().isEmpty()) { : Map<String, ColumnDef> primaryKeyColDefsByName : = ColumnDef.mapByColumnNames(getPrimaryKeyColumnDefs()); : Map<String, DistributeParam> hashParamsByColName = Maps.newHashMap(); : for (DistributeParam distributeParam: getDistributeParams()) { : List<String> colNames; : if (distributeParam.getColumnNames().isEmpty()) { : colNames = ColumnDef.toColumnNames(getPrimaryKeyColumnDefs()); : } else { : colNames = distributeParam.getColumnNames(); : } : for (String colName : colNames) { : ColumnDef columnDef = primaryKeyColDefsByName.get(colName); : if (columnDef == null) { : throw new AnalysisException(String.format("Column '%s' in '%s' is not " + : "a key column. Only key columns can be used in DISTRIBUTE BY.", : colName, distributeParam.toSql())); : } : if (distributeParam.getType() == Type.HASH) { : DistributeParam otherDistributeParam = hashParamsByColName.get(colName); : if (otherDistributeParam == null) { : hashParamsByColName.put(colName, distributeParam); : } else { : throw new AnalysisException(String.format("Column '%s' used in '%s' is " + : "already used in '%s'", colName, distributeParam.toSql(), : otherDistributeParam.toSql())); : } : } : distributeParam.getColumnDefs().add(columnDef); : } : distributeParam.analyze(analyzer); : } : } > Can you briefly comment at the beginning of that block what you're doing he Done Line 314: isKeyType > nit: maybe rename to isSupportedKeyType Done http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java File fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java: Line 82: // The ColumnDefs that the colNames_ refer to. The caller of analyze() must set these : // before calling. : private final List<ColumnDef> colDefs_ = Lists.newArrayList(); > nit: can you move it below colNames_? Done but this breaks up the fields set by the parser vs others. It's not really clear this is better. Line 89: for (String name: colNames) { : colNames_.add(name.toLowerCase()); : } > single line Done but where did this come from? I bet I can find a lot of for loops that could fit on a single line. I'm not sure changing things like this is worth the time. Line 98: Preconditions.checkState(!colDefs_.isEmpty()); : Preconditions.checkState(colNames_.isEmpty() || colNames_.size() == colDefs_.size()); > Not sure I follow those checks. Is it ok to have colDefs_ which are not emp Yes, added a comment to colNames_. Line 101: Integer.MAX_VALUE > Interesting, can Kudu really handle that many buckets? :) Not sure, maybe they are planning for the future Line 106: longValue() > Can't you use that in L101 to avoid the creation of a BigDecimal instance? That's probably not safe, BigDecimal could be bigger than a long. Line 127: Preconditions.checkState(colType.isIntegerType() || colType.isStringType()); > I believe you already have a function for that check. Done http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/analysis/TableDef.java File fe/src/main/java/com/cloudera/impala/analysis/TableDef.java: Line 48: private List<ColumnDef> primaryKeyColDefs_ = Lists.newArrayList(); > final? Done http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/analysis/TableName.java File fe/src/main/java/com/cloudera/impala/analysis/TableName.java: Line 73: if (db_ == null) { : return tbl_; > I don't think a fully qualified name can have only a table nane; Ah right. I can't even figure out why I changed this. Reverted. http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java File fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java: Line 155: TableLoadingException > parseColumnType throws a TableLoadingException? this is weird... Ya it does look weird. I think Impala basically "punted" on this. I haven't had time to look into it either. http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/util/KuduUtil.java File fe/src/main/java/com/cloudera/impala/util/KuduUtil.java: Line 190: provide an : * explicit name. > That description is kind of awkward given that the input params are a datab Reworded -- To view, visit http://gerrit.cloudera.org:8080/2984 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iba5714941a81bad7a410c850e4037637cffaf8ab Gerrit-PatchSet: 4 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Casey Ching <[email protected]> Gerrit-Reviewer: Casey Ching <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-HasComments: Yes
