Matthew Jacobs has posted comments on this change.

Change subject: Simplify creating external Kudu tables and add DROP DATABASE 
CASCADE
......................................................................


Patch Set 7:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/2617/7/fe/src/main/java/com/cloudera/impala/analysis/AnalysisUtil.java
File fe/src/main/java/com/cloudera/impala/analysis/AnalysisUtil.java:

Line 1: package com.cloudera.impala.analysis;
legal header


http://gerrit.cloudera.org:8080/#/c/2617/7/fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java
File fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java:

Line 124:     return new EqualsBuilder()
thanks!


http://gerrit.cloudera.org:8080/#/c/2617/7/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java:

Line 151: getRowFormat
does this need to be public?


Line 152: rowFormat_ != null ? rowFormat_
        :         : fileFormat_ == THdfsFileFormat.TEXT ? 
RowFormat.DEFAULT_ROW_FORMAT : null;
can you change the first ternary to if/else for readability, and in case it is 
not null, can you add a Precondition that fileFormat is TEXT?


Line 224: fileFormat_ != THdfsFileFormat.KUDU &&
        :         KuduTable.KUDU_STORAGE_HANDLER.equals(
        :             tblProperties_.get(KuduTable.KEY_STORAGE_HANDLER))
This allows the Kudu storage handler to be set if the fileFormat_ is KUDU. Is 
that what you wanted?


Line 237: fileFormat_ != THdfsFileFormat.KUDU
Following up on my comment from an earlier revision. If we don't check that 
this only holds for external kudu tables, then it's valid to create a managed 
(non-external) kudu table without columns defined, which we shouldn't allow. 
(Or perhaps there's something else that's failing somewhere but it'd be harder 
to understand the issue.) I think we should just be handling the isExternal_ 
here.


Line 298: throw new AnalysisException(String.format(
        :           "ROW FORMAT cannot be specified for file format %s.", 
fileFormat_));
can you check if ff != TEXT sooner to throw this, then same on the indentation 
for the bulk of this fn


Line 308: tblProperties_
Re-analyze still seems broken...


Line 331: columnDefs_
In the case of reanalysis (reset() -> analysis()) we'll add duplicates of these 
dummy column defs which I think would break  if we ever end up reanalyzing 
twice bc it'd have duplicate col names.

I guess you can add it iff it's not already in there, but I think you'd have to 
walk through the list since it's possible for users to specify the cols as 
well. (Or do you want to disallow that?)


Line 335: keyCols
I think we might want some additional analysis on the keys, e.g., are the key 
cols valid cols?


http://gerrit.cloudera.org:8080/#/c/2617/7/fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java
File 
fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java:

Line 160: Preconditions.checkState(
        :         
cols.get(0).getName().equals(REQUIRED_THRIFT_COLUMN_DEF.getColName()));
I didn't think we prevented users from creating an external table with columns 
defined? In fact, I was thinking you might want to do that so you can define a 
subset of the kudu columns, perhaps. I guess based on the code below we just 
won't support it. That's fine, but maybe we should throw an analysis exception 
in CreateTableStmt


Line 164: kuduTable.getSchema().getColumns()
do we know if this comes back in a consistent order? Given we flatten the names 
into a string we may want to make sure it's a consistent order if we ever need 
to compare KEY_COLUMNS


Line 194: com.cloudera.impala.catalog.
I think you can remove the namespace as it's clear from l197 what's going on


Line 204: csb.build();
nit: just put this directly on l205 and save a line


Line 213:  
nit: space
(don't shoot the messenger!)


http://gerrit.cloudera.org:8080/#/c/2617/7/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java:

Line 1126: s
scenario


Line 1131: If that's assumption isn't
This doesn't make sense. that's -> that?


Line 1121: // The use of DdlDelegates combined with the fact that the catalog 
is disconnected
         :     // from the source of truth, the Hive metastore, leads to some 
complications here.
         :     // If 'db' is null, that should mean either the database doesn't 
really exist or the
         :     // database was created outside of Impala and the user didn't 
run "invalidate
         :     // metadata". If 'db' is not null, there still may be tables in 
the metastore that
         :     // would require a DdlDelegate. Another scenarios is when the 
user creates tables
         :     // that reqire a DdlDelegate in Impala, then drops the database 
in Hive. Ideally the
         :     // user visible result of DROP DATABASE CASCADE would be the 
same in any case. The
         :     // best solution is probably to move the DdlDelegate 
functionality into Hive. For
         :     // now Impala will assume that any table not in its cache also 
doesn't require the
         :     // use of a DdlDelegate. If that's assumption isn't correct, the 
database will still
         :     // be dropped in the metastore but the underlying data would 
remain.
this is getting big, how about creating a JIRA to track this Hive/Impala 
integration consideration. I think it's worthwhile since field folks may need 
to think about this or encounter this, so filing a JIRA now would be useful. 
Then we can put most of this text there and reference the JIRA w/ just a brief 
comment.


Line 1132:     // be dropped in the metastore but the underlying data would 
remain.
Can you add that users can be sure to avoid this by calling refresh before 
dropping a database?


Line 1159: the user is expected to ALTER TABLE to either set the
         :       // table to UNMANAGED or set the format to one that doesn't 
require a delegate.
we should track this somehow so we can eventually get it in documentation. do 
you have a list of usability/doc concerns? if not, it's probably worth starting 
something


Line 1541: setDistributeParams(distributeBy)
This is only relevant to Kudu, so it'd be nice if we could have the 
KuduDdlDelegate take this in its constructor so that the base class doesn't 
need to have this parameter/getter/setter which is specific to a specific 
implementation.


Line 1540: DdlDelegate ddlDelegate = createDdlDelegate(newTable)
         :         .setDistributeParams(distributeBy);
         :     ddlDelegate.createTable();
Marcel had said in the sync-up about doing this only when you had a real 
delegate (i.e. not having a "no-op delegate")?


Line 1558: ddlDelegate.dropTable();
This would be affected as well


http://gerrit.cloudera.org:8080/#/c/2617/7/testdata/workloads/functional-query/queries/QueryTest/kudu-show-create.test
File 
testdata/workloads/functional-query/queries/QueryTest/kudu-show-create.test:

Line 10: STORED AS KUDU
what about distribution parameters?


http://gerrit.cloudera.org:8080/#/c/2617/7/tests/common/kudu_test_suite.py
File tests/common/kudu_test_suite.py:

Line 52: def get_db_name(cls):
       :     if not cls.__DB_NAME:
       :       cls.__DB_NAME = \
       :           choice(ascii_lowercase) + "".join(sample(ascii_lowercase + 
digits, 5))
       :     return cls.__DB_NAME
what about handling the race?


-- 
To view, visit http://gerrit.cloudera.org:8080/2617
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic141102818b6dad3016181b179a14024d0ff709d
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Casey Ching <[email protected]>
Gerrit-Reviewer: Casey Ching <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-HasComments: Yes

Reply via email to