Casey Ching has posted comments on this change.

Change subject: Simplify creating external Kudu tables and add DROP DATABASE 
CASCADE
......................................................................


Patch Set 3:

(38 comments)

http://gerrit.cloudera.org:8080/#/c/2617/3//COMMIT_MSG
Commit Message:

Line 23: KUDU as a file format
> what is Hive planning to do here? Ideally we can be compatible
Kudu isn't supported at all in Hive yet. As far as I know there's no way to use 
it. Someone (from Cloudera) is working on it though. 
https://issues.apache.org/jira/browse/HIVE-12971 is for this. We need to 
coordinate with them eventually.


http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 1243:   {: RESULT = null; :}
> why change this?
Row format is specific to text files. So it should never be used with non-text 
files. As it was, there was no way to differentiate between a user specifying 
the default row format or not specifying the row format. Now if the value is 
null, that indicates no row format was given.


http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java
File fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java:

Line 123: f (comment_ != null ?
        :         !comment_.equals(columnDef.comment_) : columnDef.comment_ != 
null) return false;
> these nested ternary statements are hard to read
Heh, this is all generated code so I didn't really look at it much.


Line 118: public boolean equals(Object o) {
        :     if (this == o) return true;
        :     if (o == null || getClass() != o.getClass()) return false;
        :     ColumnDef columnDef = (ColumnDef) o;
        :     if (!colName_.equals(columnDef.colName_)) return false;
        :     if (comment_ != null ?
        :         !comment_.equals(columnDef.comment_) : columnDef.comment_ != 
null) return false;
        :     if (typeDef_ != null ?
        :         !typeDef_.equals(columnDef.typeDef_) : columnDef.typeDef_ != 
null) return false;
        :     return !(type_ != null ? !type_.equals(columnDef.type_) : 
columnDef.type_ != null);
        :   }
> would be cleaner to use org.apache.commons.lang.builder.EqualsBuilder
Done. Btw if you use a common IDE there is probably a plugin to generate using 
that builder https://plugins.jetbrains.com/plugin/7283?pr=


http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java:

Line 81:  
> nit: odd spacing
Changed to do what the java people seem to do. I looked at some of their code.


Line 194: setTable_properties(tblProperties_);
> could the behavior be different in cases where this wasn't set previously?
I guess it's possible but there doesn't seem to be any difference. I expect the 
tests would fail if there was a problem.


Line 224: KuduTable.KUDU_STORAGE_HANDLER.equals(
        :             tblProperties_.get(KuduTable.KEY_STORAGE_HANDLER)
> we should just make sure it's OK with others if we don't support the Hive s
I think this is closer to the standard syntax.


Line 236: fileFormat_ != THdfsFileFormat.KUDU
> can we only throw for kudu if not external?
That's possible but keeping the Kudu logic together as much as possible seems 
better. For example maybe there are some cases where avro tables require 
columns too, it might get messy having all the details here.


Line 298: rowFormat_ = RowFormat.DEFAULT_ROW_FORMAT;
> I'm not sure if this is OK. If reset() gets called and then re-analyze, I t
Ah I hadn't considered reset(). Keeping null for the row format seems better 
since it's really just a text table thing. The row format is actually optional 
in TCreateTableParams. Maybe that will work. Are you ok with that?


Line 304: AnalysisException
> do we need this as well as the exception on l229?
They are both needed but could be moved closer together. I'm fine either way.


Line 307: tblProperties_
> Same concern about re-analysis since we don't have the original state and t
Hmm, ya maybe I'll keep the properties set during analysis in a separate place 
that can be cleared during reset().


Line 313: location
> what location is this referring to? I'm not sure what this comment is about
Updated the TODO. Let me know if it still doesn't make sense.


Line 347:  
> nit: extra space for our java style
ridiculous


Line 349: TODO: Investigate what happens if more than one DistributeComponent
        :           //       without any columns is specified. Maybe reject 
that here?
> can you add a test case for this?
It's a TODO?


Line 362: throwIfExists
> how about throwIfNotEmpty to avoid confusion with the above fn?
Changed to throwIfNotNullOrNotEmpty()


Line 358:   private void throwIfExists(Object o, String message) throws 
AnalysisException {
        :     if (o != null) throw new AnalysisException(message);
        :   }
        : 
        :   private void throwIfExists(Collection c, String message) throws 
AnalysisException {
        :     if (c != null && !c.isEmpty()) throw new 
AnalysisException(message);
        :   }
        : 
        :   private void throwIfNotExists(Object o, String message) throws 
AnalysisException {
        :     if (o == null) throw new AnalysisException(message);
        :   }
> can you move these to static methods on a utility class, maybe AnalysisUtil
Done


http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 153: private
> brief comment
Done


http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/catalog/delegates/DdlDelegate.java
File fe/src/main/java/com/cloudera/impala/catalog/delegates/DdlDelegate.java:

Line 64: in not
> extra word
Done


Line 63: If the table does not exist
       :    * or in not managed an exception will not be thrown
> Seems like an implementation detail given this is an abstract method, and i
I left it as is. The sentences list the information in order of importance.

Maybe it's not clear and I can reword this, but subclasses must adhere to this 
description. Ex, they must not throw if the table doesn't exist.


http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java
File 
fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java:

Line 51: REQUIRED_COLUMN_DEF
> comment that this is to satisfy thrift?
Renamed it to avoid the comment.


http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java:

Line 1129: For
         :     // now Impala will assume that any table not in its cache also 
doesn't require the
         :     // use of a DdlDelegate.
> So it sounds like there are 3(?) concerning cases:
Yes the Kudu tables would remain but the database metadata in Hive would be 
gone. Added to the comment.


Line 1146:         
> 4 spaces
Done


Line 1533: distributeBy
> nit: add this to the header comment too
Done. I didn't put much effort into this since it doesn't really make sense. 
I'll clean it up later.


Line 1537: populating columns and such
> By mutating newTable directly? If so, can you say that.
Done


http://gerrit.cloudera.org:8080/#/c/2617/3/fe/src/main/java/com/cloudera/impala/util/KuduClient.java
File fe/src/main/java/com/cloudera/impala/util/KuduClient.java:

Line 34: public class KuduClient implements AutoCloseable{
> nit space
Done


http://gerrit.cloudera.org:8080/#/c/2617/3/infra/python/bootstrap_virtualenv.py
File infra/python/bootstrap_virtualenv.py:

Line 80:   
> nit extra space
Done


Line 127: ensure_kudu_installed_if_possible
> ensure_X_if_possible is funny. How about try_install_kudu?
renamed


Line 129: If the toolchain isn't in use or hasn't been populated
        :      yet, nothing will be done.
> In most cases (at least a fresh env calling buildall.sh) the toolchain shou
I'm not sure why you bring up "most cases" since it needs to work in all cases. 
The toolchain bootstrap depends on the virtualenv so this really shouldn't run 
the first time around.


Line 131: able
> supported?
Shoould have been "available"


Line 176: s
> extra s
Done


http://gerrit.cloudera.org:8080/#/c/2617/3/infra/python/deps/requirements.txt
File infra/python/deps/requirements.txt:

Line 56: Cython == 0.23.4
> does the order matter? if not, can you put this special kudu stuff at the b
Oops I meant to indent cython/numpy. They are below to show the dependency.


http://gerrit.cloudera.org:8080/#/c/2617/3/testdata/workloads/functional-query/queries/QueryTest/kudu-show-create.test
File 
testdata/workloads/functional-query/queries/QueryTest/kudu-show-create.test:

Line 9: )
> It's not ideal that this create table result statement will not be executab
Ya the create managed table part will be done later. It's just in a weird state 
now.


Line 11: ====
> is this just whitespace?
I'm not sure what going on here. The file seems fine locally. No extra line or 
spaces.


http://gerrit.cloudera.org:8080/#/c/2617/3/tests/common/kudu_test_suite.py
File tests/common/kudu_test_suite.py:

Line 53: if not cls.__DB_NAME:
       :       cls.__DB_NAME = \
       :           choice(ascii_lowercase) + "".join(sample(ascii_lowercase + 
digits, 5))
       :     return cls.__DB_NAME
> potential race if tests are run in parallel? why does the db_name need to b
Yes there is a race. I need to test this more. This needs to be its own thing 
because the method has an external caller.


Line 59:   def random_table_name(cls):
> get_
Left as is. If anything I'd change it to generate_. I think it reads fine 
though.


Line 78:          assert not kudu.table_exists(kudu_table.name)
> this line needs to be in the with block
It's correct as-is. The Kudu table won't exist outside the block and that's 
what this line is showing.


Line 103: drop_impala_table_after_context
> comment please, not immediately clear how to use this
Done


http://gerrit.cloudera.org:8080/#/c/2617/3/tests/conftest.py
File tests/conftest.py:

Line 264: conn
> Is all this stuff needed in this review? I wasn't sure what's going on with
The tests use this stuff. It seems pretty small compared to the other stuff. 
Anyhow I can check in the test infra first if you like. I'm not sure if there 
would be tests to go along with the infra though. The tests would be in the 
following commit.


-- 
To view, visit http://gerrit.cloudera.org:8080/2617
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic141102818b6dad3016181b179a14024d0ff709d
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Casey Ching <[email protected]>
Gerrit-Reviewer: Casey Ching <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-HasComments: Yes

Reply via email to