[GitHub] [hudi] satishkotha commented on a change in pull request #2532: [HUDI-1534]HiveSyncTool-It is not necessary to use JDBC and MetaStoreClient at the same time

GitBox Thu, 18 Feb 2021 15:04:21 -0800


satishkotha commented on a change in pull request #2532:
URL: https://github.com/apache/hudi/pull/2532#discussion_r578798873




##########
File path: 
hudi-sync/hudi-dla-sync/src/main/java/org/apache/hudi/dla/util/DLASchemaUtil.java
##########
@@ -0,0 +1,432 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.dla.util;
+
+import org.apache.hudi.hive.HiveSyncConfig;
+import org.apache.hudi.hive.HoodieHiveSyncException;
+import org.apache.hudi.sync.common.SchemaDifference;
+import org.apache.hudi.hive.util.ColumnNameXLator;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.parquet.schema.DecimalMetadata;
+import org.apache.parquet.schema.GroupType;
+import org.apache.parquet.schema.MessageType;
+import org.apache.parquet.schema.OriginalType;
+import org.apache.parquet.schema.PrimitiveType;
+import org.apache.parquet.schema.Type;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.ArrayList;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Schema Utilities.

Review comment:
       This looks identical to HiveSchemaUtil. Can you help me understand why 
we need this separate util class? Is 'tickSupport' only the difference? Do you 
think we can pass that as a flag/config to avoid code duplication?

##########
File path: 
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java
##########
@@ -87,30 +89,30 @@ public void testSchemaConvertArray() throws IOException {
         .named("ArrayOfInts");
 
     String schemaString = HiveSchemaUtil.generateSchemaString(schema);
-    assertEquals("`int_list` ARRAY< int>", schemaString);
+    assertEquals("int_list ARRAY<int>", schemaString);

Review comment:
       Can you add test coverage for DLASchemaUtil#generateSchemaString (and 
other methods in DLASchemaUtil?)

##########
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
##########
@@ -64,32 +56,23 @@
 public class HoodieHiveClient extends AbstractSyncHoodieClient {
 
   private static final String HOODIE_LAST_COMMIT_TIME_SYNC = 
"last_commit_time_sync";
-  private static final String HIVE_ESCAPE_CHARACTER = 
HiveSchemaUtil.HIVE_ESCAPE_CHARACTER;
 
   private static final Logger LOG = 
LogManager.getLogger(HoodieHiveClient.class);
   private final PartitionValueExtractor partitionValueExtractor;
   private IMetaStoreClient client;
   private HiveSyncConfig syncConfig;
   private FileSystem fs;
-  private Connection connection;
   private HoodieTimeline activeTimeline;
-  private HiveConf configuration;
 
   public HoodieHiveClient(HiveSyncConfig cfg, HiveConf configuration, 
FileSystem fs) {
     super(cfg.basePath, cfg.assumeDatePartitioning, 
cfg.useFileListingFromMetadata, cfg.verifyMetadataFileListing, fs);
     this.syncConfig = cfg;
     this.fs = fs;
 
-    this.configuration = configuration;
-    // Support both JDBC and metastore based implementations for backwards 
compatiblity. Future users should
-    // disable jdbc and depend on metastore client for all hive registrations
-    if (cfg.useJdbc) {

Review comment:
       Seems like this is good to have backward compatibility and can be 
disabled using config easily (we can also set it to false by default?) 
   
   
   Do you think this adds a lot of overhead? Is this change discussed in open 
source meetup/email lists?

##########
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
##########
@@ -343,116 +304,48 @@ public boolean doesTableExist(String tableName) {
    */
   public boolean doesDataBaseExist(String databaseName) {
     try {
-      Database database = client.getDatabase(databaseName);
-      if (database != null && databaseName.equals(database.getName())) {
-        return true;
-      }
+      List<String> databases = client.getAllDatabases();
+      return databases.contains(databaseName.toLowerCase());
     } catch (TException e) {
+      LOG.error("Failed to check if database exists " + databaseName, e);
       throw new HoodieHiveSyncException("Failed to check if database exists " 
+ databaseName, e);
     }
-    return false;
   }
 
-  /**
-   * Execute a update in hive metastore with this SQL.
-   *
-   * @param s SQL to execute
-   */
-  public void updateHiveSQL(String s) {
-    if (syncConfig.useJdbc) {
-      Statement stmt = null;
-      try {
-        stmt = connection.createStatement();
-        LOG.info("Executing SQL " + s);
-        stmt.execute(s);
-      } catch (SQLException e) {
-        throw new HoodieHiveSyncException("Failed in executing SQL " + s, e);
-      } finally {
-        closeQuietly(null, stmt);
-      }
-    } else {
-      updateHiveSQLUsingHiveDriver(s);
+  public void createDataBase(String databaseName, String location, String 
description) {

Review comment:
       nit:  Database instead of DataBase everywhere? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] satishkotha commented on a change in pull request #2532: [HUDI-1534]HiveSyncTool-It is not necessary to use JDBC and MetaStoreClient at the same time

Reply via email to