[Impala-CR](cdh5-trunk) IMPALA-2848: Simplify creation of a new Kudu table

Casey Ching (Code Review) Wed, 18 May 2016 15:16:15 -0700

Casey Ching has posted comments on this change.

Change subject: IMPALA-2848: Simplify creation of a new Kudu table
......................................................................



Patch Set 4:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java
File fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java:

Line 51: private final boolean isPrimaryKey_;
> You may want to add a comment. This is specific to kudu tables only.
Done


Line 178: mapByColumnNames(
> maybe "getNameToColumnDefinitionMap()"
We probably have too many "get..." methods already. Ideally "get" would 
retrieve something. I'm open to changing though, what about it don't you like? 
Maybe "toColumnNameMap()" or "convert..."?


http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java:

Line 195: 
        : 
> I thought this was only for Kudu tables.
It might be, it's redundant with the line below so I just removed it. (The IDE 
complains about it.)


http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java:

Line 104: those are not included here
> Out of curiosity, why is that the case?
Updated the comment, it's just a different data structure. The column level pk 
info is stored in the ColumnDef class. I'm not sure if that's what you are 
asking about though.


Line 179: throwIfPrimaryKeysWereSpecified
> Can't you use the generic throwIfNotNullOrNotEmpty here as well?
Well the ColumnDefs need to be checked too. So more work is needed (iterate 
over all of them).


Line 211: // TODO: Find out what is creating a directory in HDFS and stop doing 
that. Kudu
        :     //       tables shouldn't have HDFS dirs.
> Is this still relevant? Maybe create a JIRA and reference it here?
Done


Line 221: throwIfPrimaryKeysWereSpecified(
> Same comment as in L179
Same


Line 233: else {
> I would put the entire else block in a separate function.
It's best to explain why, this statement isn't really compelling :)

IMO it's an even trade. This way the logic is altogether. As a reader who 
doesn't care about Kudu, there's only one function to skip over. If you do 
care, now you may need to skip around in the code. If you want to modify the 
separate function, you need to find all the callers and check that your 
modifications integrate well with the callers.

It's not a big deal, I'll change it if you really want but I don't see how it's 
an improvement.


Line 265: if (!getDistributeParams().isEmpty()) {
        :         Map<String, ColumnDef> primaryKeyColDefsByName
        :             = ColumnDef.mapByColumnNames(getPrimaryKeyColumnDefs());
        :         Map<String, DistributeParam> hashParamsByColName = 
Maps.newHashMap();
        :         for (DistributeParam distributeParam: getDistributeParams()) {
        :           List<String> colNames;
        :           if (distributeParam.getColumnNames().isEmpty()) {
        :             colNames = 
ColumnDef.toColumnNames(getPrimaryKeyColumnDefs());
        :           } else {
        :             colNames = distributeParam.getColumnNames();
        :           }
        :           for (String colName : colNames) {
        :             ColumnDef columnDef = 
primaryKeyColDefsByName.get(colName);
        :             if (columnDef == null) {
        :               throw new AnalysisException(String.format("Column '%s' 
in '%s' is not " +
        :                       "a key column. Only key columns can be used in 
DISTRIBUTE BY.",
        :                   colName, distributeParam.toSql()));
        :             }
        :             if (distributeParam.getType() == Type.HASH) {
        :               DistributeParam otherDistributeParam = 
hashParamsByColName.get(colName);
        :               if (otherDistributeParam == null) {
        :                 hashParamsByColName.put(colName, distributeParam);
        :               } else {
        :                 throw new AnalysisException(String.format("Column 
'%s' used in '%s' is " +
        :                         "already used in '%s'", colName, 
distributeParam.toSql(),
        :                     otherDistributeParam.toSql()));
        :               }
        :             }
        :             distributeParam.getColumnDefs().add(columnDef);
        :           }
        :           distributeParam.analyze(analyzer);
        :         }
        :       }
> Can you briefly comment at the beginning of that block what you're doing he
Done


Line 314: isKeyType
> nit: maybe rename to isSupportedKeyType
Done


http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java
File fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java:

Line 82: // The ColumnDefs that the colNames_ refer to. The caller of analyze() 
must set these
       :   // before calling.
       :   private final List<ColumnDef> colDefs_ = Lists.newArrayList();
> nit: can you move it below colNames_?
Done but this breaks up the fields set by the parser vs others. It's not really 
clear this is better.


Line 89: for (String name: colNames) {
       :       colNames_.add(name.toLowerCase());
       :     }
> single line
Done but where did this come from? I bet I can find a lot of for loops that 
could fit on a single line. I'm not sure changing things like this is worth the 
time.


Line 98: Preconditions.checkState(!colDefs_.isEmpty());
       :     Preconditions.checkState(colNames_.isEmpty() || colNames_.size() 
== colDefs_.size());
> Not sure I follow those checks. Is it ok to have colDefs_ which are not emp
Yes, added a comment to colNames_.


Line 101: Integer.MAX_VALUE
> Interesting, can Kudu really handle that many buckets? :)
Not sure, maybe they are planning for the future


Line 106: longValue()
> Can't you use that in L101 to avoid the creation of a BigDecimal instance?
That's probably not safe, BigDecimal could be bigger than a long.


Line 127: Preconditions.checkState(colType.isIntegerType() || 
colType.isStringType());
> I believe you already have a function for that check.
Done


http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/analysis/TableDef.java
File fe/src/main/java/com/cloudera/impala/analysis/TableDef.java:

Line 48:   private List<ColumnDef> primaryKeyColDefs_ = Lists.newArrayList();
> final?
Done


http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/analysis/TableName.java
File fe/src/main/java/com/cloudera/impala/analysis/TableName.java:

Line 73: if (db_ == null) {
       :       return tbl_;
> I don't think a fully qualified name can have only a table nane;
Ah right. I can't even figure out why I changed this. Reverted.


http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java
File fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java:

Line 155: TableLoadingException
> parseColumnType throws a TableLoadingException? this is weird...
Ya it does look weird. I think Impala basically "punted" on this. I haven't had 
time to look into it either.


http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/util/KuduUtil.java
File fe/src/main/java/com/cloudera/impala/util/KuduUtil.java:

Line 190: provide an
        :    * explicit name.
> That description is kind of awkward given that the input params are a datab
Reworded


-- 
To view, visit http://gerrit.cloudera.org:8080/2984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iba5714941a81bad7a410c850e4037637cffaf8ab
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Casey Ching <[email protected]>
Gerrit-Reviewer: Casey Ching <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-2848: Simplify creation of a new Kudu table

Reply via email to