Casey Ching has posted comments on this change.

Change subject: Simplify creating external Kudu tables and add DROP DATABASE 
CASCADE
......................................................................


Patch Set 9:

(28 comments)

http://gerrit.cloudera.org:8080/#/c/2617/9/fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java
File fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java:

Line 125: this.
> You can get rid off 'this'.
Done


http://gerrit.cloudera.org:8080/#/c/2617/9/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java:

Line 151: RowFormat getRowFormat() {
> Private? public?
Package access is what is needed.


Line 153: fileFormat_ == null
> Out of curiosity, when does this happen?
Hmm maybe it's not needed. Removed it.


Line 193: getRowFormat()
> rowFormat
Done


Line 228: // The old way of specifying a Kudu table was to use the storage 
handler table
        :       // property. The new way is to specify the file format. 
Analysis is simpler if
        :       // the old way is rejected.
> Not sure if this comment is particularly useful.
Removed


Line 303: analyzeKudu
> maybe analyzeKuduStmt or analyzeKuduProperties?
Change to analyzeKuduFormat


http://gerrit.cloudera.org:8080/#/c/2617/9/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 153: address
> nit: addresses
Done


http://gerrit.cloudera.org:8080/#/c/2617/9/fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java
File 
fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java:

Line 66: private String getTableName() {
       :     return msTbl_.getTableName();
       :   }
> Single line?
Done


Line 96: if (Strings.isNullOrEmpty(masterAddrs)) {
       :       throw new ImpalaRuntimeException(String.format(
       :           "Table property '%s' is required when catalogd startup flag 
" +
       :           "-kudu_master_addrs is not used.", 
KuduTable.KEY_MASTER_ADDRESSES));
       :     }
> Do we need this check this here? Shouldn't we be checking the default maste
I think it's needed late/here, so there won't be a startup error if Kudu isn't 
used. Were you thinking there would always be a default given or did I 
misunderstand?


Line 119: explicitly or implicitly
> What does that mean? Do you mean fully qualified (e.g. db.table_name)?
Updated comment, let me know if it still doesn't make sense.


Line 128: new ArrayList<>(2)
> Lists.newArrayList(2);
Changed but I think we should stop using third-party stuff for features 
supported directly by the language.


Line 183: if (kudu.tableExists(kuduTableName)) {
> What will happen if I try to create a managed kudu table and the kudu table
Right now that is not allowed during analysis. A managed table must have an 
explicit Kudu table name given. In the future, the plan is to use [<db 
name>.]<hive metastore name> as the Kudu table name.


Line 234: getKuduTableName()
> Isn't that the same as the kuduTableName?
Ah yes.


http://gerrit.cloudera.org:8080/#/c/2617/9/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java:

Line 1115: fetched
> "from the Hive MetaStore".
Done


Line 1115: The metadata contained in the Metastore Table class is
         :     // enough to identify the format. A direct RPC against the 
Metastore is used to avoid
         :     // loading unnecessary partition data and statistics as would be 
done through the
         :     // CatalogServiceCatalog.
> I think you can remove this part of the comment.
Done


Line 1119: new ArrayList<>();
> Use guava: Lists.newArrayList(); also in L1120
Changed but I don't understand why people still use that. This diamond feature 
was added so stuff like Lists.newArrayList() wouldn't be needed.


Line 1130: assume that any table not in its cache also doesn't require the
         :     // use of a DdlDelegate
> Maybe I've missed something, but I am not sure the code below reflects that
Oh right, I was thinking non-UnsupportedDdlDelegate. Updated the comment.


Line 1133: an
> a
Done


Line 1121: The use of DdlDelegates combined with the fact that the catalog is 
disconnected
         :     // from the source of truth, the Hive metastore, leads to some 
complications here.
         :     // If 'db' is null, that should mean either the database doesn't 
really exist or the
         :     // database was created outside of Impala and the user didn't 
run "invalidate
         :     // metadata". If 'db' is not null, there still may be tables in 
the metastore that
         :     // would require a DdlDelegate. Another scenario is when the 
user creates tables
         :     // that reqire a DdlDelegate in Impala, then drops the database 
in Hive. Ideally the
         :     // user visible result of DROP DATABASE CASCADE would be the 
same in any case. The
         :     // best solution is probably to move the DdlDelegate 
functionality into Hive. For
         :     // now Impala will assume that any table not in its cache also 
doesn't require the
         :     // use of a DdlDelegate. If that assumption isn't correct, the 
database will still
         :     // be dropped in the metastore but the underlying data would 
remain. Users can issue
         :     // an REFRESH command to load the database metadata before 
dropping to ensure
         :     // delegates will be used when needed.
> Thanks for filing the JIRA. What about removing all the text here (leaving 
Hmm that seems like a bad practice. The code should make sense on its own, 
without needing to look up notes about the code in some website.


Line 1132: Users can issue
         :     // an REFRESH command to load the database metadata before 
dropping to ensure
         :     // delegates will be used when needed.
> Isn't that what is essentially happening in L1148?
Oh the really bad case is when db == null then the metadata is never fetched at 
all.


Line 1145: if (!incompleteTableNames.isEmpty()) {
         :       MetaStoreClient msClient = catalog_.getMetaStoreClient();
         :       try {
         :         
msTables.addAll(msClient.getHiveClient().getTableObjectsByName(
         :             db.getName(), incompleteTableNames));
         :       } catch (TException e) {
         :         throw new ImpalaRuntimeException(
         :             String.format(HMS_RPC_ERROR_FORMAT_STR, 
"getTableObjectsByName"), e);
         :       } finally {
         :         msClient.release();
         :       }
         :     }
> I think this block will only be executed in the db != null case. Why don't 
Ah right, thanks


Line 1158: createDdlDelegate(msTable)
> My understanding is that msTables contains all the tables that haven't been
Yes, I think with the DDL delegate system, we are supposed to assume that all 
tables could need a delegate.


Line 1542: DdlDelegate ddlDelegate = createDdlDelegate(newTable)
         :         .setDistributeParams(distributeBy);
         :     ddlDelegate.createTable();
> It feels weird to be calling a DdlDelegate even for tables that don't need 
Since the DDL delegate stuff will be changing I'm holding off on this. I added 
a TODO to investigate the IF NOT EXISTS.


Line 2731: sd != null
> In which cases sd is null?
Hmm, that doesn't seem possible. Not sure what happened.


http://gerrit.cloudera.org:8080/#/c/2617/9/fe/src/main/java/com/cloudera/impala/service/JniCatalog.java
File fe/src/main/java/com/cloudera/impala/service/JniCatalog.java:

Line 78: defaultKuduMasterAddrs
> Why 'default'? Couldn't it be any Kudu master addrs specified through the s
Ya its the same. This represents the default value the catalog should use. I'm 
not sure I follow.


http://gerrit.cloudera.org:8080/#/c/2617/9/fe/src/main/java/com/cloudera/impala/util/KuduClient.java
File fe/src/main/java/com/cloudera/impala/util/KuduClient.java:

Line 34: public class KuduClient
> I wish we could keep it in sync with the org.kududb.client API. Maybe if th
There should be a compile time error if the API changes. But ya it would be 
nice if there was an interface.


Line 46:  
> nit: maybe "Kudu table" here and in all the other error messages?
Done


http://gerrit.cloudera.org:8080/#/c/2617/9/fe/src/main/java/com/cloudera/impala/util/KuduUtil.java
File fe/src/main/java/com/cloudera/impala/util/KuduUtil.java:

Line 238: Kudu type %s is not supported in Impala
> Is that error message accurate? I think Impala supports a superset of the d
It's right. For example there is a binary type in Kudu.


-- 
To view, visit http://gerrit.cloudera.org:8080/2617
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic141102818b6dad3016181b179a14024d0ff709d
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Casey Ching <[email protected]>
Gerrit-Reviewer: Casey Ching <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-HasComments: Yes

Reply via email to