Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3369: Add ALTER TABLE SET COLUMN STATS statement.
......................................................................


Patch Set 4:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/3189/4/fe/src/main/java/com/cloudera/impala/analysis/AlterTableSetColumnStats.java
File 
fe/src/main/java/com/cloudera/impala/analysis/AlterTableSetColumnStats.java:

Line 35: 'numDVs'='100','numNulls'='10','avgSize'='5.4','maxSize'='50'
nit: I think I prefer the generic description you have in the commit msg


Line 44: public final static String NDV_KEY = "numDVs";
       :   public final static String NUM_NULLS_KEY = "numNulls";
       :   public final static String AVG_SIZE_KEY = "avgSize";
       :   public final static String MAX_SIZE_KEY = "maxSize";
Don't you think these fields and the logic in isValidStatsKey() and 
setStatsValue() fns belong to the ColumnStats class?


Line 93: if (!isValidStatsKey(entry.getKey())) {
       :         throw new AnalysisException(String.format(
       :             "Invalid column stats key: %s\nValid keys are: %s",
       :             entry.getKey(), Joiner.on(',').join(ALL_STATS_KEYS)));
       :       }
I think you can push this check in L165. Actually, if you do that, you probably 
don't need the isValidStatsKey function.


Line 87: // Copy the existing stats and then change the values according to the
       :     // stats map of this stmt. The existing stats are first copied to 
preserve
       :     // those stats values that are not changed by this stmt because 
all stats
       :     // values are updated when altering the stats in the HMS.
       :     colStats_ = col.getStats().clone();
       :     for (Map.Entry<String, String> entry: statsMap_.entrySet()) {
       :       if (!isValidStatsKey(entry.getKey())) {
       :         throw new AnalysisException(String.format(
       :             "Invalid column stats key: %s\nValid keys are: %s",
       :             entry.getKey(), Joiner.on(',').join(ALL_STATS_KEYS)));
       :       }
       :       setStatsValue(entry.getKey(), entry.getValue(), col, colStats_);
       :     }
Same comment as above. This looks like something that belongs to 
ColumnStats.analyze(). What do you think?


Line 120: equalsIgnoreCase(
Instead of having all the equalsIgnoreCase() calls it may be more robust if 
your store ALL_STATS_KEYS in lower case and then call the 
statsKey.toLowerCase() once. It's easy to miss a call to ignore case if we ever 
modify this code again :)


Line 134: catch (Exception e) {
        :       }
Alternatively, you can wrap the setStatsValue in a try catch that catches only 
NumberFormatException and converts it into an proper AnalysisException instead 
of having the empty catch here and in L153. In this function you can still 
handle logical errors in setting column stat values.


Line 176: Map<String, TColumnStats> columnStats = Maps.newHashMap();
        :    columnStats.put(colName_.toString(), colStats_.toThrift());
        :    updateStatsParams.setColumn_stats(columnStats);
I believe thrift will generate a putToColumn_stats() function that you can use 
instead of L176-178.


http://gerrit.cloudera.org:8080/#/c/3189/4/fe/src/main/java/com/cloudera/impala/analysis/AlterTableStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/AlterTableStmt.java:

Line 70: table_ instanceof KuduTable
       :         && !(this instanceof AlterTableSetTblProperties)
       :         && !(this instanceof AlterTableSetColumnStats)
       :         && !(this instanceof AlterTableOrViewRenameStmt
I wonder if we should have some kind of isSupportedStmt() function that gets 
overridden by each table type and has the appropriate checks.


http://gerrit.cloudera.org:8080/#/c/3189/4/fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java
File fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java:

Line 24: import org.apache.hadoop.hive.metastore.HiveMetaStoreClient;
       : import org.apache.hadoop.hive.metastore.api.FieldSchema;
       : import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants;
       : import org.apache.log4j.Logger;
       : import org.kududb.client.KuduClient;
       : import org.kududb.client.LocatedTablet;
Out of curiosity, why did you move them here?


http://gerrit.cloudera.org:8080/#/c/3189/4/fe/src/test/java/com/cloudera/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AnalyzeDDLTest.java:

Line 598: 
For completion you may want to add tests where the keys are in upper case, 
lower case, camel-case, etc.


Line 620: 
Also add a case where the same key is specified twice ("numNulls'='2', 
'numNulls'=3). Who would have thought of such a thing :)


http://gerrit.cloudera.org:8080/#/c/3189/4/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

Line 324: _create_db
Shouldn't we be using the unique db fixture?


-- 
To view, visit http://gerrit.cloudera.org:8080/3189
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I45cd8aa7241ea962788ba9ca7d0bbfd864c4304f
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.6.0_5.8.0
Gerrit-Owner: Alex Behm <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-HasComments: Yes

Reply via email to