Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1903: Add support for partitioning by TIMESTAMP
......................................................................


Patch Set 3:

(26 comments)

http://gerrit.cloudera.org:8080/#/c/1621/3/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Line 235: 
nit: why the extra empty line?


http://gerrit.cloudera.org:8080/#/c/1621/3/be/src/exprs/expr.cc
File be/src/exprs/expr.cc:

Line 209: else {
        :         *expr = pool->Add(new NullLiteral(texpr_node));
        :       }
Are we sure about that? If we can't parse it we convert it to null? Shouldn't 
we return a non-ok status instead?


http://gerrit.cloudera.org:8080/#/c/1621/3/be/src/exprs/literal.cc
File be/src/exprs/literal.cc:

Line 202: DCHECK(
        :       type.type == TYPE_STRING || type.type == TYPE_CHAR || type.type 
== TYPE_TIMESTAMP)
        :       << type;
formatting nit: I think you can condense them in two lines


Line 254: TYPE_TIMESTAMP:
Why don't we do the DCHECK(Parse...) for timestamps?


http://gerrit.cloudera.org:8080/#/c/1621/3/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

Line 358: CheckParse
This function doesn't really check anything, correct? You're just overloading 
Parse.


http://gerrit.cloudera.org:8080/#/c/1621/3/fe/src/main/java/com/cloudera/impala/analysis/LiteralExpr.java
File fe/src/main/java/com/cloudera/impala/analysis/LiteralExpr.java:

Line 126: new TimestampLiteral(exprNode.timestamp_literal.value);
This is supposed to return an analyzed literal and I don't think your 
constructor does that (as it shouldn't). You should call the static create() 
function.


http://gerrit.cloudera.org:8080/#/c/1621/3/fe/src/main/java/com/cloudera/impala/analysis/TimestampLiteral.java
File fe/src/main/java/com/cloudera/impala/analysis/TimestampLiteral.java:

Line 1: 5
6 :)


Line 29: TimestampLiteral
Don't you need to analyze TimestampLiteral? Where do you verify that the string 
value is a valid timestamp?


Line 84: public int compareTo(LiteralExpr o) {
Is it correct to enforce ordering of timestamp literals based on their string 
representation? The same for equals().


http://gerrit.cloudera.org:8080/#/c/1621/3/fe/src/main/java/com/cloudera/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/com/cloudera/impala/planner/HdfsPartitionPruner.java:

Line 182: slot.getType().isTimestamp() || bindingExpr.getType().isTimestamp()
Why do you need both conditions?


http://gerrit.cloudera.org:8080/#/c/1621/3/fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java:

Line 571: 
Do we have test table with timestamp column?


http://gerrit.cloudera.org:8080/#/c/1621/3/testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test
File 
testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test:

Line 60: timestamp_col=__HIVE_DEFAULT_PARTITION__
Is this hive's behavior as well? i.e. add a null partition if the timestamp if 
not valid? Shouldn't we throw an error instead? For example, what will happen 
if I provide a string in an integer partition column? Will it throw an error in 
that case? Also, can you add a test case with drop partition as well (both 
valid and invalid partition values).


Line 180: ====
Can you also add a couple of more test cases:
1. An timestamp_col IN (timestamp values) predicate
2. timestamp_col + interval expressions in predicates


http://gerrit.cloudera.org:8080/#/c/1621/3/testdata/workloads/functional-query/queries/QueryTest/timestamp-partitions.test
File 
testdata/workloads/functional-query/queries/QueryTest/timestamp-partitions.test:

Line 59: BETWEEN is supported
Add also an IN predicate


Line 214: ---- QUERY
-- A 'T' character can be placed between a date and a time ...


Line 247: are syntactically between two dates
I am not sure I understand what that means exactly.


Line 270: -- All bare times (without dates) are greater than all dates
        : select ts, ts >= '00:00:00' from t
        : where ts >= '00:00:00'
        : order by ts
Why don't you just sort the rows in t by ts to demonstrate the applied ordering?


http://gerrit.cloudera.org:8080/#/c/1621/3/tests/common/beeline.py
File tests/common/beeline.py:

Line 20: class Beeline(object):
Can you add a class comment?


http://gerrit.cloudera.org:8080/#/c/1621/3/tests/metadata/test_hms_integration.py
File tests/metadata/test_hms_integration.py:

Line 102: _
Is this a placeholder of some sort?


http://gerrit.cloudera.org:8080/#/c/1621/3/tests/metadata/test_recover_partitions.py
File tests/metadata/test_recover_partitions.py:

Line 196: 1987-05-29 15:45:44.6
Timestamp value does not seem to be consistent with the comment above.


Line 208: # Create two paths '/i=0004/p=p2/t=06%3A07%3A08%3A.0' and
        :     # "i=000004/p=p2/t=06%3A07%3A08%3A.00"
Can we have a bit more pronounced difference? e.g. t1=2010-01-01 and 
t2=2010-01-01 00:00:00.000000000


Line 302: 
nit: extra empty line


http://gerrit.cloudera.org:8080/#/c/1621/3/tests/query_test/test_partitioning.py
File tests/query_test/test_partitioning.py:

Line 78: test_hive_timestamp_partitions
Brief comment on what you're testing here. Also, shouldn't this test be part of 
hive-impala integration tests?


Line 89:  
extra space?


Line 92:  
extra space?


Line 96: assert ('1' ==
Before these assertions may do a "show partitions" to demonstrate how many 
partitions we expect to see on the Impala side.


-- 
To view, visit http://gerrit.cloudera.org:8080/1621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icad7dcdc1b199cce9483dc414072bbe24efd625c
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jim Apple <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-HasComments: Yes

Reply via email to