[Impala-CR](cdh5-trunk) Review Only: Kudu Impala integration

David Ribeiro Alves (Code Review) Tue, 08 Mar 2016 09:37:50 -0800

David Ribeiro Alves has posted comments on this change.

Change subject: Review Only: Kudu Impala integration
......................................................................



Patch Set 6:

(94 comments)

http://gerrit.cloudera.org:8080/#/c/1403/6/be/src/exec/kudu-scan-node.h
File be/src/exec/kudu-scan-node.h:

Line 124:   /// Protects access to state accessed by scanner threads, such as 
'status_' or 'done_'.
> if done_ is protected by the lock, then why is it volatile?
done_ is actually sometimes not accessed within the lock (mostly for reads). 
this is legacy from nong's original impl, but I think it makes sense.


Line 128:   /// occurs in a scanner.
> Please say: "Protected by lock_" here and any other members that are protec
added thsi here, on num_active_scanners_ and on max_materialized_row_batches_


http://gerrit.cloudera.org:8080/#/c/1403/6/be/src/exec/kudu-util.cc
File be/src/exec/kudu-util.cc:

Line 112: to_lower_copy(schema.Column(i).name())
> this one seemed to get lost -- should use lower_case_col_name variable rath
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/be/src/exec/kudu-util.h
File be/src/exec/kudu-util.h:

Line 42:        IdxByLowerCaseColName* map);
> this was lost too -- fix indentation.
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 185:   // Indicates whether this is a Kudu column, If true implies all 
following Kudu specific
> "column. "
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

Line 193:   // The timestamp on which to perform the scan for the query.
> rephrase, the query might have multiple scan nodes. "The timestamp as of wh
Done


Line 198:   11: optional i64 timestamp = -1;
> "timestamp" is too generic (timestamp of what?)
changed this to snapshot timestamp


http://gerrit.cloudera.org:8080/#/c/1403/6/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

Line 164:   2: required i32 buckets
> num_buckets
Done


Line 187:   1: required TDistributeType type
> type is redundant (by_has_paramn.__isset implies hash, for instance)
Done


Line 433:   // If set, the table is automatically distributed according to this 
parameter
> ". Kudu-only."
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 887:   // follow the PARTITION feature of insert from select statements.
> don't drop this comment
Done


Line 664:   opt_with_clause:w KW_INSERT KW_OVERWRITE opt_ignore:ignore
> i'm not in love with the 'ignore' syntax, it's cryptic (what's being ignore
ok, created IMPALA-3145


Line 665:   opt_kw_table table_name:table
> move to previous line (and generally pack the lines with the syntactic elem
Done


Line 970:   distribute_param_list:distribute tbl_properties:tbl_props
> save one line (and apply elsewhere, there are lots of instances like that)
Done


Line 990:     // columns will be added by the query statement during analysis.
> update comment; you're passing in partition_cols a few lines downn
Done


Line 997: 
> remove blank line
Done


Line 1077:   {: RESULT = 
DistributeParam.createHashParam(DistributeParam.Type.HASH, cols, buckets); :}
> long line (and elsewhere)
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/analysis/AlterTableStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/AlterTableStmt.java:

Line 70: 
> remove blank line (generally, try to emulate the prevailing formatting)
Done


Line 75: 
> same here
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/analysis/BinaryPredicate.java
File fe/src/main/java/com/cloudera/impala/analysis/BinaryPredicate.java:

Line 116:    * predicate into literals, if possible.
> add "todo: create a more general mechanism and retire this function"
Done


Line 118:   public static BinaryPredicate 
normalizeAndFoldConstants(BinaryPredicate predicate,
> rename to normalizeSlotRefComparison, that makes it clearer what the normal
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/analysis/CreateTableLikeFileStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/CreateTableLikeFileStmt.java:

Line 68:         fileFormat, location, cachingOp, ifNotExists, tblProperties, 
serdeProperties, null);
> long line
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java:

Line 96:     this.columnDefs_ = Lists.newArrayList(columnDefs);
> remove "this." everywhere
Done


Line 190:       for(DistributeParam d : distributeParams_) {
> formatting
Done


Line 355:     if (!KuduTable.tableParamsAreValid(getTblProperties())) {
> have this function return a more specific error string
left a TODO will mark for a follow up.


Line 371:       for(DistributeParam d : distributeParams_) {
> formatting
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java
File fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java:

Line 50:   public static DistributeParam createHashParam(Type t, List<String> 
cols,
> the type is redundant, remove
Done


Line 58:   public static DistributeParam createRangeParam(Type t, List<String> 
cols,
> same here
Done


Line 77:   private final int buckets_;
> num_buckets_
Done


Line 82:   private TDistributeByRangeParam rangeParam_;
> comment: set in analyze()
Done


Line 100:   public void analyze(Analyzer analyzer) throws AnalysisException {
> this doesn't analyze columns_ (i'm assuming there are restrictions on the d
for this and for the relevant comments below, which require a refactor of this 
method's logic to include the analysis of 'columns_', I would prefer to file a 
jira/leave a TODO including them and perform the refactor by itself after the 
merge.


Line 106:       // Creating the thrift structure here allows to throw 
meaningful exceptions.
> i think what this is trying to say is that by creating the thrift structure
Done


Line 115:               "of projected key columns: %d != %d", splitRow.size(), 
columns_.size()));
> if someone lists a lot of split rows, it can be hard to find the offending 
Done


Line 122:             if (num.getType().isDecimal() || 
num.getType().isFloatingPointType()) {
> this should happen during the analysis of columns_. here you should only ch
again, postponing 'columns_' analysis post-merge, left a TODO


Line 135:             throw new AnalysisException(String.format("Split row 
value: %s (Type: %s) is not "
> long line. how about "Split row value is not supported: %s (Type %s)" (that
Done


Line 161:       return String.format("HASH(%s) INTO %d BUCKETS", Joiner.on(", 
").join(columns_), buckets_);
> long line. please fix those everywhere before resubmitting.
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/analysis/FromClause.java
File fe/src/main/java/com/cloudera/impala/analysis/FromClause.java:

Line 73:     // All tableRefs have been analyzed, but at least one table was 
found missing.
> was found missing -> is missing metadata
Done


Line 74:     // There is no reason to proceed with analysis past this point.
> remove this sentence, it only made sense in the context of SelectStmt
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java:

Line 665:   public DataSink createDataSink() {
> optionally rename to createTableSink and return a TableSink
the correct way to to this seems like it would be to move the createdatasink 
logic into tablesink (since it only creates tablesinks).  don't want to go into 
refactors of existing code in a merge patch leaving a TODO


Line 668:     return DataSink.createDataSink(table_, partitionKeyExprs_, 
overwrite_,  ignoreDuplicates_);
> long line
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/analysis/ModifyStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/ModifyStmt.java:

Line 87:   // On tables with a primary key, ignore key not found errors.
> why are those considered errors? in sql, an update/delete with a where clau
not sure what was the reasoning behing the original approach, but the point 
here is that updates that get sent to kudu but can't update anything return an 
error. this indicates whether the error should be ignored.
you might be right that we should reverse the behavior but that is a major 
refactor that I dont' want to include in a merge patch.
I can create a jira though, created IMPALA-3145


Line 99: 
> add a createTableSink()
you mean a createDataSink()? see below (also see comment in InsertStmt)


Line 192:     for(int i = keyColumnsOffset; i < srcStmt.resultExprs_.size(); 
++i) {
> formatting
Done


Line 242:       rhsExpr.analyze(analyzer);
> move right below l239
Done


Line 243:       lhsSlotRef.analyze(analyzer);
> similar
Done


Line 251:                 lhsSlotRef.toSql(),
> pack line more tightly
Done


Line 269:                     "reference", lhsSlotRef.toSql(), 
rhsExpr.toSql()));
> indentation
Done


Line 273:         throw new AnalysisException(format("Key column '%s' cannot be 
updated.",
> single line, if it fits
doesn't


Line 279:             format("Duplicate value assignment to column: '%s'",
> single line
Done


Line 288:             c.getName(), c.getType().toSql(), rhsExpr.toSql(),
> single line
Done


Line 299: 
> remove blank lines
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/analysis/StmtRewriter.java
File fe/src/main/java/com/cloudera/impala/analysis/StmtRewriter.java:

Line 57:       throw new AnalysisException("Update performs internal rewrite 
only.");
> if this is an internal error, it should be a checkState()
check comment above


Line 57:       throw new AnalysisException("Update performs internal rewrite 
only.");
> cryptic error message - what does this mean? also, why did this become an e
this is something that I fixed but got lost and was reverted to the original. 
in the first version of the merge (and now again) update and delete rewrite the 
subqueries as any other statement


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/analysis/UpdateStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/UpdateStmt.java:

Line 45:   public UpdateStmt(List<String> targetTablePath,
> fill up lines
Done


Line 63:     return KuduTableSink.createUpdateSink(table_, referencedColumns_,
> single line
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/catalog/KuduColumn.java
File fe/src/main/java/com/cloudera/impala/catalog/KuduColumn.java:

Line 31:     this.isKey_ = isKey;
> remove this.
Done


Line 35:   /**
> remove comments for getters and blank lines between them
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java
File fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java:

Line 59:   // Alias to the string key that identifies the storage handler for 
tables.
> "for Kudu tables"
Done


Line 75:   // Kudu specific value for the storage handler table property keyed 
by KEY_STORAGE_HANDLER.
> long line
Done


Line 135:         && 
KUDU_STORAGE_HANDLER.equals(mstbl.getParameters().get(KEY_STORAGE_HANDLER));
> shouldn't just this one be sufficient?
Done


Line 145:       throw new TableLoadingException(String.format("Kudu tables must 
have at least one key "
> long line
Done


Line 163:     if (Sets.intersection(columnNames, keyColumns).size() < 
keyColumns.size()) {
> isn't this sufficient: if (!columnNames.containsAll(keyColumns))
Done


Line 186:     String keyColumnsProp = 
Preconditions.checkNotNull(msTbl.getParameters().get(KEY_KEY_COLUMNS)
> long line (please look for those and fix all of them)
Done


Line 187:         .toLowerCase(), "'kudu.key_columns' cannot be null.");
> never mind.
ok


Line 187:         .toLowerCase(), "'kudu.key_columns' cannot be null.");
> we only use Preconditions for internal logic errors, not expected user erro
ok


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/catalog/delegates/DdlDelegate.java
File fe/src/main/java/com/cloudera/impala/catalog/delegates/DdlDelegate.java:

Line 52:   public DdlDelegate setDistributeParams(List<TDistributeParam> p) {
> how this is abstract? (this only applies to kudu)
what do you mean, the class? I didn't implement this but it seems that there is 
one for kudu and one for everything else and both inherit from this class


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/planner/DataSink.java
File fe/src/main/java/com/cloudera/impala/planner/DataSink.java:

Line 61:       boolean overwrite, boolean ignoreDuplicates_) {
> no trailing _
Done


Line 76:       return KuduTableSink.createInsertSink(table, ignoreDuplicates_);
> what about updates?
Refactored this do do all operations


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java
File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java:

Line 73:         analysisResult.isUpdateStmt() || analysisResult.isDeleteStmt())
> stick with existing formatting:
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java:

Line 68:   private final Set<Integer> hostIndexSet_ = Sets.newHashSet();
> explain
Done


Line 84:     // Extract predicates that can evaluated by Kudu.
> can be
Done


Line 114:       List<LocatedTablet> tabletLocations =
> i'm hoping this is cached in the client.
this is, but we create a new client above, so we do this rpc everytime. we need 
to work on making the client survive across queries


Line 154:     double baseSelectivity = super.computeSelectivity();
> take a look at what PlanNode.computeSelectivity() does now (we removed the 
I don't want to tinker with the selectivity computation on the merge patch. 
particularly because this should be done properly, with a test. created a jira 
( IMPALA-3148) and left a TODO


Line 174:     cardinality_ = Math.max(1, cardinality_);
> and also not > numRows
Done


Line 221:    * of the result can be evaluated with Expr::GetValue(NULL)).
> doesn't describe side effects on conjuncts param, which happens to be conju
done for the most part. the side effect (that the predicates are no longer in 
conjuncts_) is likely clear from the work "extract" right?


Line 273:   private static BinaryPredicate 
transformExclusiveIntLiteralPredicatesToInclusive(
> really awkward name; rename (normalizeIntLiteralComparison?)
Done


Line 284:           if (intValue >= Byte.MAX_VALUE) return comparisonPred;
> if it's >= max_value, can't you safely substitute this predicate with 'fals
left a TODO to do this.


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/planner/KuduTableSink.java
File fe/src/main/java/com/cloudera/impala/planner/KuduTableSink.java:

Line 33:  * data from a plan fragment into an Kudu table using the Kudu client.
> "a Kudu"
Done


Line 42:   private ArrayList<Integer> targetColdIdxs_;
> why "Cold"? (as opposed to Col)
Done


Line 55:   public static KuduTableSink createInsertSink(Table targetTable,
> get rid of these static functions
refactored all of this TableSink.create


Line 105:   public enum Type {
> migrate into TableSink and call this Op
Done


Line 111:       public TTableSinkType toThrift() { return 
TTableSinkType.KUDU_INSERT; }
> similar changes here
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java:

Line 984:         handler.dropTable();
> do we have a handler for hbase?
we have the NoOpDelegate for everything non-kudu


Line 1267:       // Forward the opeation to a specific storage backend. If the 
operation fails,
> spelling
Done


Line 1306:    * available for the table, returns a NoOpDelegate instance.
> i'd assume that whatever we do for kudu should be mirrored for hbase.
agreed, probably not here though. Any specific action item you're recommending?


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/main/java/com/cloudera/impala/util/KuduUtil.java
File fe/src/main/java/com/cloudera/impala/util/KuduUtil.java:

Line 53:   public static boolean compareSchema(Table hiveTable, KuduTable 
kuduTable)
> msTable instead of hiveTable (apply throughout this function)
Done


Line 66:       if (kuduField == null || fromImpalaType(
> change indentation to make this more parseable
Done


Line 85:   public static List<PartialRow> parseSplits(Schema schema, String 
kuduSplits)
> who calls this?
KuduDdlDelegate


Line 142:   private static void setKey(PartialRow key, Type type, JsonArray 
array, int pos)
> comment
Done


Line 157:   private static void setKey(PartialRow key, Type type, TRangeLiteral 
literal, int pos,
> comment
Done


Line 253:   static String ToString(TRangeLiteral l) throws 
ImpalaRuntimeException {
> toString()
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/test/java/com/cloudera/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AnalyzeDDLTest.java:

Line 1617:         "range(a) split rows ((1),('abc'),(3)) " +
> this is obviously wrong
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/fe/src/test/java/com/cloudera/impala/analysis/AnalyzeModifyStmtsTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AnalyzeModifyStmtsTest.java:

Line 72:     AnalyzesOk("delete from functional_kudu.testtbl");
> delete ignore is missing
Done


http://gerrit.cloudera.org:8080/#/c/1403/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu-delete.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu-delete.test:

Line 4: |  ignoreKeysNotFoundOrDuplicate: false
> we don't use camel case here. also, something more succinct would be nice, 
a problem here is that the keyword means two different things. for insert it 
means "ignore duplicate keys" and for update/delete means "ignore missing 
keys". I refactored the explain string to output different things for different 
ops and to avoid the double negative (this required flipping the flag, which I 
did for printing purposes only)


-- 
To view, visit http://gerrit.cloudera.org:8080/1403
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I239314acbc8434ef673a3a59d2a82a0338ea5876
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Martin Grund <[email protected]>
Gerrit-Reviewer: Silvius Rus <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) Review Only: Kudu Impala integration

Reply via email to