[Impala-CR](cdh5-trunk) IMPALA-2848: Simplify creation of a new Kudu table

Dimitris Tsirogiannis (Code Review) Wed, 18 May 2016 11:04:22 -0700

Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2848: Simplify creation of a new Kudu table
......................................................................



Patch Set 4:

(20 comments)

Initial pass. Haven't gone through the tests yet.

http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java
File fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java:

Line 51: private final boolean isPrimaryKey_;
You may want to add a comment. This is specific to kudu tables only.


Line 178: mapByColumnNames(
maybe "getNameToColumnDefinitionMap()"


http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java:

Line 195: 
        : 
I thought this was only for Kudu tables.


http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java:

Line 104: those are not included here
Out of curiosity, why is that the case?


Line 179: throwIfPrimaryKeysWereSpecified
Can't you use the generic throwIfNotNullOrNotEmpty here as well?


Line 211: // TODO: Find out what is creating a directory in HDFS and stop doing 
that. Kudu
        :     //       tables shouldn't have HDFS dirs.
Is this still relevant? Maybe create a JIRA and reference it here?


Line 221: throwIfPrimaryKeysWereSpecified(
Same comment as in L179


Line 233: else {
I would put the entire else block in a separate function.


Line 265: if (!getDistributeParams().isEmpty()) {
        :         Map<String, ColumnDef> primaryKeyColDefsByName
        :             = ColumnDef.mapByColumnNames(getPrimaryKeyColumnDefs());
        :         Map<String, DistributeParam> hashParamsByColName = 
Maps.newHashMap();
        :         for (DistributeParam distributeParam: getDistributeParams()) {
        :           List<String> colNames;
        :           if (distributeParam.getColumnNames().isEmpty()) {
        :             colNames = 
ColumnDef.toColumnNames(getPrimaryKeyColumnDefs());
        :           } else {
        :             colNames = distributeParam.getColumnNames();
        :           }
        :           for (String colName : colNames) {
        :             ColumnDef columnDef = 
primaryKeyColDefsByName.get(colName);
        :             if (columnDef == null) {
        :               throw new AnalysisException(String.format("Column '%s' 
in '%s' is not " +
        :                       "a key column. Only key columns can be used in 
DISTRIBUTE BY.",
        :                   colName, distributeParam.toSql()));
        :             }
        :             if (distributeParam.getType() == Type.HASH) {
        :               DistributeParam otherDistributeParam = 
hashParamsByColName.get(colName);
        :               if (otherDistributeParam == null) {
        :                 hashParamsByColName.put(colName, distributeParam);
        :               } else {
        :                 throw new AnalysisException(String.format("Column 
'%s' used in '%s' is " +
        :                         "already used in '%s'", colName, 
distributeParam.toSql(),
        :                     otherDistributeParam.toSql()));
        :               }
        :             }
        :             distributeParam.getColumnDefs().add(columnDef);
        :           }
        :           distributeParam.analyze(analyzer);
        :         }
        :       }
Can you briefly comment at the beginning of that block what you're doing here?


Line 314: isKeyType
nit: maybe rename to isSupportedKeyType


http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java
File fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java:

Line 82: // The ColumnDefs that the colNames_ refer to. The caller of analyze() 
must set these
       :   // before calling.
       :   private final List<ColumnDef> colDefs_ = Lists.newArrayList();
nit: can you move it below colNames_?


Line 89: for (String name: colNames) {
       :       colNames_.add(name.toLowerCase());
       :     }
single line


Line 98: Preconditions.checkState(!colDefs_.isEmpty());
       :     Preconditions.checkState(colNames_.isEmpty() || colNames_.size() 
== colDefs_.size());
Not sure I follow those checks. Is it ok to have colDefs_ which are not empty 
and colNames_ which are empty?


Line 101: Integer.MAX_VALUE
Interesting, can Kudu really handle that many buckets? :)


Line 106: longValue()
Can't you use that in L101 to avoid the creation of a BigDecimal instance?


Line 127: Preconditions.checkState(colType.isIntegerType() || 
colType.isStringType());
I believe you already have a function for that check.


http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/analysis/TableDef.java
File fe/src/main/java/com/cloudera/impala/analysis/TableDef.java:

Line 48:   private List<ColumnDef> primaryKeyColDefs_ = Lists.newArrayList();
final?


http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/analysis/TableName.java
File fe/src/main/java/com/cloudera/impala/analysis/TableName.java:

Line 73: if (db_ == null) {
       :       return tbl_;
I don't think a fully qualified name can have only a table nane;


http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java
File fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java:

Line 155: TableLoadingException
parseColumnType throws a TableLoadingException? this is weird...


http://gerrit.cloudera.org:8080/#/c/2984/4/fe/src/main/java/com/cloudera/impala/util/KuduUtil.java
File fe/src/main/java/com/cloudera/impala/util/KuduUtil.java:

Line 190: provide an
        :    * explicit name.
That description is kind of awkward given that the input params are a database 
and a table name which are used to create a name :)


-- 
To view, visit http://gerrit.cloudera.org:8080/2984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iba5714941a81bad7a410c850e4037637cffaf8ab
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Casey Ching <[email protected]>
Gerrit-Reviewer: Casey Ching <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-2848: Simplify creation of a new Kudu table

Reply via email to