Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-1903: Add support for partitioning by TIMESTAMP ......................................................................
Patch Set 3: (6 comments) Wanted to flush some responses to previous comments. Starting a new round now. http://gerrit.cloudera.org:8080/#/c/1621/3/be/src/exprs/expr.cc File be/src/exprs/expr.cc: Line 209: else { : *expr = pool->Add(new NullLiteral(texpr_node)); : } > Some Hive timestamps are invalid Impala timestamps, like ones before 1400AD It is debatable whether turning invalid values to NULLs is the correct thing to do, but I can accept that for consistency reasons. For the same reasons though I think creating a null partition because a bogus value was passed on the DDL is not consistent with the existing behavior for other data types. http://gerrit.cloudera.org:8080/#/c/1621/3/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: Line 358: CheckParse > Kind of. Parse actually returns a result by modifying parameters 3 and 4. T Why not just Parse? http://gerrit.cloudera.org:8080/#/c/1621/3/fe/src/main/java/com/cloudera/impala/analysis/TimestampLiteral.java File fe/src/main/java/com/cloudera/impala/analysis/TimestampLiteral.java: Line 84: public int compareTo(LiteralExpr o) { > I think so, because partition pruning uses the backend. In particular, some I am not so sure about that. If this Literal implements compareTo, it sends the wrong message that I can use it in the FE to perform static partitioning which as we know is not true because parsing is performed in the backend. http://gerrit.cloudera.org:8080/#/c/1621/3/fe/src/main/java/com/cloudera/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/com/cloudera/impala/planner/HdfsPartitionPruner.java: Line 182: slot.getType().isTimestamp() || bindingExpr.getType().isTimestamp() > It looks to me like BinaryPredicates can compare different types. If that's But why do you care about the type of the binding expr? http://gerrit.cloudera.org:8080/#/c/1621/3/testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test File testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test: Line 60: timestamp_col=__HIVE_DEFAULT_PARTITION__ > Hive does not allow the partition to be added. My main point is that I don't believe we should convert 'invalid timestamp' into a NULL and create the partition. I believe an error should be thrown in a case like this. http://gerrit.cloudera.org:8080/#/c/1621/3/tests/metadata/test_recover_partitions.py File tests/metadata/test_recover_partitions.py: Line 196: 1987-05-29 15:45:44.6 > Which one? The two paths in the comment are intentionally different. There are two paths mentioned in L188-189. The one used here doesn't match any of them. Is this intentional? Maybe I am missing something.. too many zeroes :P -- To view, visit http://gerrit.cloudera.org:8080/1621 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icad7dcdc1b199cce9483dc414072bbe24efd625c Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Jim Apple <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-HasComments: Yes
