[jira] [Work logged] (HIVE-26035) Explore moving to directsql for ObjectStore::addPartitions

ASF GitHub Bot (Jira) Tue, 10 Jan 2023 02:01:39 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-26035?focusedWorklogId=838278&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-838278
 ]


ASF GitHub Bot logged work on HIVE-26035:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Jan/23 10:00
            Start Date: 10/Jan/23 10:00
    Worklog Time Spent: 10m 
      Work Description: dengzhhu653 commented on code in PR #3905:
URL: https://github.com/apache/hive/pull/3905#discussion_r1065571201


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##########
@@ -515,6 +529,803 @@ public List<String> 
getMaterializedViewsForRewriting(String dbName) throws MetaE
     }
   }
 
+  private Long getDataStoreId(Class<?> modelClass) throws MetaException {
+    ExecutionContext ec = ((JDOPersistenceManager) pm).getExecutionContext();
+    AbstractClassMetaData cmd = 
ec.getMetaDataManager().getMetaDataForClass(modelClass, 
ec.getClassLoaderResolver());
+    if (cmd.getIdentityType() == IdentityType.DATASTORE) {
+      return (Long) ec.getStoreManager().getValueGenerationStrategyValue(ec, 
cmd, -1);
+    } else {
+      throw new MetaException("Identity type is not datastore.");
+    }
+  }
+
+  /**
+   * Interface to execute multiple row insert query in batch for direct SQL
+   */
+  interface BatchExecutionContext {
+    void execute(String batchQueryText, int batchRowCount, int 
batchParamCount) throws MetaException;
+  }
+
+  private void insertInBatch(String tableName, String columns, int 
columnCount, String rowFormat, int rowCount,
+      BatchExecutionContext bec) throws MetaException {
+    if (rowCount == 0 || columnCount == 0) {
+      return;
+    }
+    int maxParamsCount = maxParamsInInsert;
+    if (maxParamsCount < columnCount) {
+      LOG.error("Maximum number of parameters in the direct SQL batch insert 
query is less than the table: {}"
+          + " columns. Executing single row insert queries.", tableName);
+      maxParamsCount = columnCount;
+    }
+    int maxRowsInBatch = maxParamsCount / columnCount;
+    int maxBatches = rowCount / maxRowsInBatch;
+    int last = rowCount % maxRowsInBatch;
+    String query = "";
+    if (maxBatches > 0) {
+      query = dbType.getBatchInsertQuery(tableName, columns, rowFormat, 
maxRowsInBatch);
+    }
+    int batchParamCount = maxRowsInBatch * columnCount;
+    for (int batch = 0; batch < maxBatches; batch++) {
+      bec.execute(query, maxRowsInBatch, batchParamCount);
+    }
+    if (last != 0) {
+      query = dbType.getBatchInsertQuery(tableName, columns, rowFormat, last);
+      bec.execute(query, last, last * columnCount);
+    }
+  }
+
+  private void insertSerdeInBatch(Map<Long, MSerDeInfo> serdeIdToSerDeInfo) 
throws MetaException {
+    int rowCount = serdeIdToSerDeInfo.size();
+    String columns = 
"(\"SERDE_ID\",\"DESCRIPTION\",\"DESERIALIZER_CLASS\",\"NAME\",\"SERDE_TYPE\",\"SLIB\","
+        + "\"SERIALIZER_CLASS\")";
+    String row = "(?,?,?,?,?,?,?)";
+    int columnCount = 7;
+    BatchExecutionContext bec = new BatchExecutionContext() {

Review Comment:
   There has already a class named Batchable for the same purpose: 
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Batchable.java



##########
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java:
##########
@@ -753,6 +755,12 @@ public enum ConfVars {
             "SQL. For some DBs like Oracle and MSSQL, there are hardcoded or 
perf-based limitations\n" +
             "that necessitate this. For DBs that can handle the queries, this 
isn't necessary and\n" +
             "may impede performance. -1 means no batching, 0 means automatic 
batching."),
+    
DIRECT_SQL_MAX_PARAMS_IN_INSERT("metastore.direct.sql.max.parameters.in.insert",

Review Comment:
   you can reuse the property: `metastore.direct.sql.batch.size`



##########
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java:
##########
@@ -692,6 +692,8 @@ public enum ConfVars {
         "Default transaction isolation level for identity generation."),
     
DATANUCLEUS_USE_LEGACY_VALUE_STRATEGY("datanucleus.rdbms.useLegacyNativeValueStrategy",
         "datanucleus.rdbms.useLegacyNativeValueStrategy", true, ""),
+    DATANUCLEUS_QUERY_SQL_ALLOWALL("datanucleus.query.sql.allowAll", 
"datanucleus.query.sql.allowAll",

Review Comment:
   Seems we have no where to use this property.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 838278)
    Time Spent: 1.5h  (was: 1h 20m)

> Explore moving to directsql for ObjectStore::addPartitions
> ----------------------------------------------------------
>
>                 Key: HIVE-26035
>                 URL: https://issues.apache.org/jira/browse/HIVE-26035
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently {{addPartitions}} uses datanuclues and is super slow for large 
> number of partitions. It will be good to move to direct sql. Lots of repeated 
> SQLs can be avoided as well (e.g SDS, SERDE, TABLE_PARAMS)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26035) Explore moving to directsql for ObjectStore::addPartitions

Reply via email to