Casey Ching has posted comments on this change. Change subject: Simplify creating external Kudu tables and add DROP DATABASE CASCADE ......................................................................
Patch Set 9: (28 comments) http://gerrit.cloudera.org:8080/#/c/2617/9/fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java File fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java: Line 125: this. > You can get rid off 'this'. Done http://gerrit.cloudera.org:8080/#/c/2617/9/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java: Line 151: RowFormat getRowFormat() { > Private? public? Package access is what is needed. Line 153: fileFormat_ == null > Out of curiosity, when does this happen? Hmm maybe it's not needed. Removed it. Line 193: getRowFormat() > rowFormat Done Line 228: // The old way of specifying a Kudu table was to use the storage handler table : // property. The new way is to specify the file format. Analysis is simpler if : // the old way is rejected. > Not sure if this comment is particularly useful. Removed Line 303: analyzeKudu > maybe analyzeKuduStmt or analyzeKuduProperties? Change to analyzeKuduFormat http://gerrit.cloudera.org:8080/#/c/2617/9/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java: Line 153: address > nit: addresses Done http://gerrit.cloudera.org:8080/#/c/2617/9/fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java File fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java: Line 66: private String getTableName() { : return msTbl_.getTableName(); : } > Single line? Done Line 96: if (Strings.isNullOrEmpty(masterAddrs)) { : throw new ImpalaRuntimeException(String.format( : "Table property '%s' is required when catalogd startup flag " + : "-kudu_master_addrs is not used.", KuduTable.KEY_MASTER_ADDRESSES)); : } > Do we need this check this here? Shouldn't we be checking the default maste I think it's needed late/here, so there won't be a startup error if Kudu isn't used. Were you thinking there would always be a default given or did I misunderstand? Line 119: explicitly or implicitly > What does that mean? Do you mean fully qualified (e.g. db.table_name)? Updated comment, let me know if it still doesn't make sense. Line 128: new ArrayList<>(2) > Lists.newArrayList(2); Changed but I think we should stop using third-party stuff for features supported directly by the language. Line 183: if (kudu.tableExists(kuduTableName)) { > What will happen if I try to create a managed kudu table and the kudu table Right now that is not allowed during analysis. A managed table must have an explicit Kudu table name given. In the future, the plan is to use [<db name>.]<hive metastore name> as the Kudu table name. Line 234: getKuduTableName() > Isn't that the same as the kuduTableName? Ah yes. http://gerrit.cloudera.org:8080/#/c/2617/9/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java: Line 1115: fetched > "from the Hive MetaStore". Done Line 1115: The metadata contained in the Metastore Table class is : // enough to identify the format. A direct RPC against the Metastore is used to avoid : // loading unnecessary partition data and statistics as would be done through the : // CatalogServiceCatalog. > I think you can remove this part of the comment. Done Line 1119: new ArrayList<>(); > Use guava: Lists.newArrayList(); also in L1120 Changed but I don't understand why people still use that. This diamond feature was added so stuff like Lists.newArrayList() wouldn't be needed. Line 1130: assume that any table not in its cache also doesn't require the : // use of a DdlDelegate > Maybe I've missed something, but I am not sure the code below reflects that Oh right, I was thinking non-UnsupportedDdlDelegate. Updated the comment. Line 1133: an > a Done Line 1121: The use of DdlDelegates combined with the fact that the catalog is disconnected : // from the source of truth, the Hive metastore, leads to some complications here. : // If 'db' is null, that should mean either the database doesn't really exist or the : // database was created outside of Impala and the user didn't run "invalidate : // metadata". If 'db' is not null, there still may be tables in the metastore that : // would require a DdlDelegate. Another scenario is when the user creates tables : // that reqire a DdlDelegate in Impala, then drops the database in Hive. Ideally the : // user visible result of DROP DATABASE CASCADE would be the same in any case. The : // best solution is probably to move the DdlDelegate functionality into Hive. For : // now Impala will assume that any table not in its cache also doesn't require the : // use of a DdlDelegate. If that assumption isn't correct, the database will still : // be dropped in the metastore but the underlying data would remain. Users can issue : // an REFRESH command to load the database metadata before dropping to ensure : // delegates will be used when needed. > Thanks for filing the JIRA. What about removing all the text here (leaving Hmm that seems like a bad practice. The code should make sense on its own, without needing to look up notes about the code in some website. Line 1132: Users can issue : // an REFRESH command to load the database metadata before dropping to ensure : // delegates will be used when needed. > Isn't that what is essentially happening in L1148? Oh the really bad case is when db == null then the metadata is never fetched at all. Line 1145: if (!incompleteTableNames.isEmpty()) { : MetaStoreClient msClient = catalog_.getMetaStoreClient(); : try { : msTables.addAll(msClient.getHiveClient().getTableObjectsByName( : db.getName(), incompleteTableNames)); : } catch (TException e) { : throw new ImpalaRuntimeException( : String.format(HMS_RPC_ERROR_FORMAT_STR, "getTableObjectsByName"), e); : } finally { : msClient.release(); : } : } > I think this block will only be executed in the db != null case. Why don't Ah right, thanks Line 1158: createDdlDelegate(msTable) > My understanding is that msTables contains all the tables that haven't been Yes, I think with the DDL delegate system, we are supposed to assume that all tables could need a delegate. Line 1542: DdlDelegate ddlDelegate = createDdlDelegate(newTable) : .setDistributeParams(distributeBy); : ddlDelegate.createTable(); > It feels weird to be calling a DdlDelegate even for tables that don't need Since the DDL delegate stuff will be changing I'm holding off on this. I added a TODO to investigate the IF NOT EXISTS. Line 2731: sd != null > In which cases sd is null? Hmm, that doesn't seem possible. Not sure what happened. http://gerrit.cloudera.org:8080/#/c/2617/9/fe/src/main/java/com/cloudera/impala/service/JniCatalog.java File fe/src/main/java/com/cloudera/impala/service/JniCatalog.java: Line 78: defaultKuduMasterAddrs > Why 'default'? Couldn't it be any Kudu master addrs specified through the s Ya its the same. This represents the default value the catalog should use. I'm not sure I follow. http://gerrit.cloudera.org:8080/#/c/2617/9/fe/src/main/java/com/cloudera/impala/util/KuduClient.java File fe/src/main/java/com/cloudera/impala/util/KuduClient.java: Line 34: public class KuduClient > I wish we could keep it in sync with the org.kududb.client API. Maybe if th There should be a compile time error if the API changes. But ya it would be nice if there was an interface. Line 46: > nit: maybe "Kudu table" here and in all the other error messages? Done http://gerrit.cloudera.org:8080/#/c/2617/9/fe/src/main/java/com/cloudera/impala/util/KuduUtil.java File fe/src/main/java/com/cloudera/impala/util/KuduUtil.java: Line 238: Kudu type %s is not supported in Impala > Is that error message accurate? I think Impala supports a superset of the d It's right. For example there is a binary type in Kudu. -- To view, visit http://gerrit.cloudera.org:8080/2617 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic141102818b6dad3016181b179a14024d0ff709d Gerrit-PatchSet: 9 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Casey Ching <[email protected]> Gerrit-Reviewer: Casey Ching <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-HasComments: Yes
