Jim Apple has posted comments on this change. Change subject: IMPALA-1903: Add support for partitioning by TIMESTAMP ......................................................................
Patch Set 3: (26 comments) http://gerrit.cloudera.org:8080/#/c/1621/3/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: Line 235: > nit: why the extra empty line? Done http://gerrit.cloudera.org:8080/#/c/1621/3/be/src/exprs/expr.cc File be/src/exprs/expr.cc: Line 209: else { : *expr = pool->Add(new NullLiteral(texpr_node)); : } > Are we sure about that? If we can't parse it we convert it to null? Shouldn Some Hive timestamps are invalid Impala timestamps, like ones before 1400AD. Returning non-ok status causes: I0307 14:20:14.831833 19548 status.cc:112] Invalid timestamp literal 1399-12-31 23:59:59.999999999 @ 0x10bfaa7 impala::Status::Status() @ 0x168e7cd impala::Expr::CreateExpr() @ 0x168dff6 impala::Expr::CreateTreeFromThrift() @ 0x168e2dd impala::Expr::CreateTreeFromThrift() @ 0x168db69 impala::Expr::CreateExprTree() @ 0x134d075 Java_com_cloudera_impala_service_FeSupport_NativeEvalConstExprs @ 0x7fdebdfd0c39 (unknown) E0307 14:20:14.831908 19548 expr.cc:148] Could not construct expr tree. Invalid timestamp literal 1399-12-31 23:59:59.999999999 in select count(*) from hive_timestamp_partitions.t where ts is null; in query_test/test_partitioning.py::TestPartitioning::test_hive_timestamp_partitions Dimitris and I talked about allowing syntactically valid timestamps from Hive that are semantically invalid ones in Impala, like ones from 1399 or before, while disallowing complete garbage, like "THIS IS NOT A TIMESTAMP". By "allowing" those Hive timestamps, I mean turning them into NULL values, and by "disallowing", I mean returning an error to the user. While this makes some sense to me, in that we prevent a user from accidentally making some NULL timestamp partitions, it is not completely consistent with how we handle invalid timestamps in non-partition columns, which looks like: WARNINGS: Error converting column: 0 TO TIMESTAMP (Data is: 1399-10-10 01:01:01) (and then treating the value as NULL) Maybe it's better to warn here? http://gerrit.cloudera.org:8080/#/c/1621/3/be/src/exprs/literal.cc File be/src/exprs/literal.cc: Line 202: DCHECK( : type.type == TYPE_STRING || type.type == TYPE_CHAR || type.type == TYPE_TIMESTAMP) : << type; > formatting nit: I think you can condense them in two lines Done Line 254: TYPE_TIMESTAMP: > Why don't we do the DCHECK(Parse...) for timestamps? Done. http://gerrit.cloudera.org:8080/#/c/1621/3/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: Line 358: CheckParse > This function doesn't really check anything, correct? You're just overloadi Kind of. Parse actually returns a result by modifying parameters 3 and 4. This just returns true if s is parsable. I've added a comment. Thoughts on the naming? Maybe "IsParsable"? http://gerrit.cloudera.org:8080/#/c/1621/3/fe/src/main/java/com/cloudera/impala/analysis/LiteralExpr.java File fe/src/main/java/com/cloudera/impala/analysis/LiteralExpr.java: Line 126: new TimestampLiteral(exprNode.timestamp_literal.value); > This is supposed to return an analyzed literal and I don't think your const Yes, that analysis all happens in the backend. http://gerrit.cloudera.org:8080/#/c/1621/3/fe/src/main/java/com/cloudera/impala/analysis/TimestampLiteral.java File fe/src/main/java/com/cloudera/impala/analysis/TimestampLiteral.java: Line 1: 5 > 6 :) Done Line 29: TimestampLiteral > Don't you need to analyze TimestampLiteral? Where do you verify that the st In canEvalUsingPartitionMd, we don't eval timestamps for validity in the frontend. Line 84: public int compareTo(LiteralExpr o) { > Is it correct to enforce ordering of timestamp literals based on their stri I think so, because partition pruning uses the backend. In particular, some of the tests test ordering, and timestamps in partitions seem to behave the same as timestamps in other columns. http://gerrit.cloudera.org:8080/#/c/1621/3/fe/src/main/java/com/cloudera/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/com/cloudera/impala/planner/HdfsPartitionPruner.java: Line 182: slot.getType().isTimestamp() || bindingExpr.getType().isTimestamp() > Why do you need both conditions? It looks to me like BinaryPredicates can compare different types. If that's so, then I think the answer to your question is yes. http://gerrit.cloudera.org:8080/#/c/1621/3/fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java File fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java: Line 571: > Do we have test table with timestamp column? alltypes has a timestamp column. I don't think I understand what you're getting at, though. http://gerrit.cloudera.org:8080/#/c/1621/3/testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test File testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test: Line 60: timestamp_col=__HIVE_DEFAULT_PARTITION__ > Is this hive's behavior as well? i.e. add a null partition if the timestamp Hive does not allow the partition to be added. Do we want to mimic that? If so, we will have an asymmetry between what partitions we can add (no null values) and what partitions we can read (null partition values are fine). Line 180: ==== > Can you also add a couple of more test cases: Done http://gerrit.cloudera.org:8080/#/c/1621/3/testdata/workloads/functional-query/queries/QueryTest/timestamp-partitions.test File testdata/workloads/functional-query/queries/QueryTest/timestamp-partitions.test: Line 59: BETWEEN is supported > Add also an IN predicate Done Line 214: ---- QUERY > -- A 'T' character can be placed between a date and a time ... Done Line 247: are syntactically between two dates > I am not sure I understand what that means exactly. Added explanation Line 270: -- All bare times (without dates) are greater than all dates : select ts, ts >= '00:00:00' from t : where ts >= '00:00:00' : order by ts > Why don't you just sort the rows in t by ts to demonstrate the applied orde I added a test using order by, as well, to ensure that where >= '00:00:00' and ORDER BY have similar results. http://gerrit.cloudera.org:8080/#/c/1621/3/tests/common/beeline.py File tests/common/beeline.py: Line 20: class Beeline(object): > Can you add a class comment? Done http://gerrit.cloudera.org:8080/#/c/1621/3/tests/metadata/test_hms_integration.py File tests/metadata/test_hms_integration.py: Line 102: _ > Is this a placeholder of some sort? This is to make HiveDbWrapper.__init__ and ImalaDbWrapper.__init__ have the same type, so that lines like 290 work. http://gerrit.cloudera.org:8080/#/c/1621/3/tests/metadata/test_recover_partitions.py File tests/metadata/test_recover_partitions.py: Line 196: 1987-05-29 15:45:44.6 > Timestamp value does not seem to be consistent with the comment above. Which one? The two paths in the comment are intentionally different. Line 208: # Create two paths '/i=0004/p=p2/t=06%3A07%3A08%3A.0' and : # "i=000004/p=p2/t=06%3A07%3A08%3A.00" > Can we have a bit more pronounced difference? e.g. t1=2010-01-01 and t2=201 I've added more zeros, but I've kept the time-only nature of the values to ensure that case works. Line 302: > nit: extra empty line Done http://gerrit.cloudera.org:8080/#/c/1621/3/tests/query_test/test_partitioning.py File tests/query_test/test_partitioning.py: Line 78: test_hive_timestamp_partitions > Brief comment on what you're testing here. Also, shouldn't this test be par Added comment. As far as test location, my feeling is that the test belongs here because it is specifically about timestamp partitions, while the only other test with 'integration' in its name I see is metadata/test_hms_integration.py. Line 89: > extra space? Done Line 92: > extra space? Done Line 96: assert ('1' == > Before these assertions may do a "show partitions" to demonstrate how many Done -- To view, visit http://gerrit.cloudera.org:8080/1621 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icad7dcdc1b199cce9483dc414072bbe24efd625c Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Jim Apple <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-HasComments: Yes
