Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-1903: Add support for partitioning by TIMESTAMP ......................................................................
Patch Set 3: (26 comments) http://gerrit.cloudera.org:8080/#/c/1621/3/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: Line 235: nit: why the extra empty line? http://gerrit.cloudera.org:8080/#/c/1621/3/be/src/exprs/expr.cc File be/src/exprs/expr.cc: Line 209: else { : *expr = pool->Add(new NullLiteral(texpr_node)); : } Are we sure about that? If we can't parse it we convert it to null? Shouldn't we return a non-ok status instead? http://gerrit.cloudera.org:8080/#/c/1621/3/be/src/exprs/literal.cc File be/src/exprs/literal.cc: Line 202: DCHECK( : type.type == TYPE_STRING || type.type == TYPE_CHAR || type.type == TYPE_TIMESTAMP) : << type; formatting nit: I think you can condense them in two lines Line 254: TYPE_TIMESTAMP: Why don't we do the DCHECK(Parse...) for timestamps? http://gerrit.cloudera.org:8080/#/c/1621/3/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: Line 358: CheckParse This function doesn't really check anything, correct? You're just overloading Parse. http://gerrit.cloudera.org:8080/#/c/1621/3/fe/src/main/java/com/cloudera/impala/analysis/LiteralExpr.java File fe/src/main/java/com/cloudera/impala/analysis/LiteralExpr.java: Line 126: new TimestampLiteral(exprNode.timestamp_literal.value); This is supposed to return an analyzed literal and I don't think your constructor does that (as it shouldn't). You should call the static create() function. http://gerrit.cloudera.org:8080/#/c/1621/3/fe/src/main/java/com/cloudera/impala/analysis/TimestampLiteral.java File fe/src/main/java/com/cloudera/impala/analysis/TimestampLiteral.java: Line 1: 5 6 :) Line 29: TimestampLiteral Don't you need to analyze TimestampLiteral? Where do you verify that the string value is a valid timestamp? Line 84: public int compareTo(LiteralExpr o) { Is it correct to enforce ordering of timestamp literals based on their string representation? The same for equals(). http://gerrit.cloudera.org:8080/#/c/1621/3/fe/src/main/java/com/cloudera/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/com/cloudera/impala/planner/HdfsPartitionPruner.java: Line 182: slot.getType().isTimestamp() || bindingExpr.getType().isTimestamp() Why do you need both conditions? http://gerrit.cloudera.org:8080/#/c/1621/3/fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java File fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java: Line 571: Do we have test table with timestamp column? http://gerrit.cloudera.org:8080/#/c/1621/3/testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test File testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test: Line 60: timestamp_col=__HIVE_DEFAULT_PARTITION__ Is this hive's behavior as well? i.e. add a null partition if the timestamp if not valid? Shouldn't we throw an error instead? For example, what will happen if I provide a string in an integer partition column? Will it throw an error in that case? Also, can you add a test case with drop partition as well (both valid and invalid partition values). Line 180: ==== Can you also add a couple of more test cases: 1. An timestamp_col IN (timestamp values) predicate 2. timestamp_col + interval expressions in predicates http://gerrit.cloudera.org:8080/#/c/1621/3/testdata/workloads/functional-query/queries/QueryTest/timestamp-partitions.test File testdata/workloads/functional-query/queries/QueryTest/timestamp-partitions.test: Line 59: BETWEEN is supported Add also an IN predicate Line 214: ---- QUERY -- A 'T' character can be placed between a date and a time ... Line 247: are syntactically between two dates I am not sure I understand what that means exactly. Line 270: -- All bare times (without dates) are greater than all dates : select ts, ts >= '00:00:00' from t : where ts >= '00:00:00' : order by ts Why don't you just sort the rows in t by ts to demonstrate the applied ordering? http://gerrit.cloudera.org:8080/#/c/1621/3/tests/common/beeline.py File tests/common/beeline.py: Line 20: class Beeline(object): Can you add a class comment? http://gerrit.cloudera.org:8080/#/c/1621/3/tests/metadata/test_hms_integration.py File tests/metadata/test_hms_integration.py: Line 102: _ Is this a placeholder of some sort? http://gerrit.cloudera.org:8080/#/c/1621/3/tests/metadata/test_recover_partitions.py File tests/metadata/test_recover_partitions.py: Line 196: 1987-05-29 15:45:44.6 Timestamp value does not seem to be consistent with the comment above. Line 208: # Create two paths '/i=0004/p=p2/t=06%3A07%3A08%3A.0' and : # "i=000004/p=p2/t=06%3A07%3A08%3A.00" Can we have a bit more pronounced difference? e.g. t1=2010-01-01 and t2=2010-01-01 00:00:00.000000000 Line 302: nit: extra empty line http://gerrit.cloudera.org:8080/#/c/1621/3/tests/query_test/test_partitioning.py File tests/query_test/test_partitioning.py: Line 78: test_hive_timestamp_partitions Brief comment on what you're testing here. Also, shouldn't this test be part of hive-impala integration tests? Line 89: extra space? Line 92: extra space? Line 96: assert ('1' == Before these assertions may do a "show partitions" to demonstrate how many partitions we expect to see on the Impala side. -- To view, visit http://gerrit.cloudera.org:8080/1621 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icad7dcdc1b199cce9483dc414072bbe24efd625c Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Jim Apple <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-HasComments: Yes
