[
https://issues.apache.org/jira/browse/HIVE-26035?focusedWorklogId=838278&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-838278
]
ASF GitHub Bot logged work on HIVE-26035:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 10/Jan/23 10:00
Start Date: 10/Jan/23 10:00
Worklog Time Spent: 10m
Work Description: dengzhhu653 commented on code in PR #3905:
URL: https://github.com/apache/hive/pull/3905#discussion_r1065571201
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##########
@@ -515,6 +529,803 @@ public List<String>
getMaterializedViewsForRewriting(String dbName) throws MetaE
}
}
+ private Long getDataStoreId(Class<?> modelClass) throws MetaException {
+ ExecutionContext ec = ((JDOPersistenceManager) pm).getExecutionContext();
+ AbstractClassMetaData cmd =
ec.getMetaDataManager().getMetaDataForClass(modelClass,
ec.getClassLoaderResolver());
+ if (cmd.getIdentityType() == IdentityType.DATASTORE) {
+ return (Long) ec.getStoreManager().getValueGenerationStrategyValue(ec,
cmd, -1);
+ } else {
+ throw new MetaException("Identity type is not datastore.");
+ }
+ }
+
+ /**
+ * Interface to execute multiple row insert query in batch for direct SQL
+ */
+ interface BatchExecutionContext {
+ void execute(String batchQueryText, int batchRowCount, int
batchParamCount) throws MetaException;
+ }
+
+ private void insertInBatch(String tableName, String columns, int
columnCount, String rowFormat, int rowCount,
+ BatchExecutionContext bec) throws MetaException {
+ if (rowCount == 0 || columnCount == 0) {
+ return;
+ }
+ int maxParamsCount = maxParamsInInsert;
+ if (maxParamsCount < columnCount) {
+ LOG.error("Maximum number of parameters in the direct SQL batch insert
query is less than the table: {}"
+ + " columns. Executing single row insert queries.", tableName);
+ maxParamsCount = columnCount;
+ }
+ int maxRowsInBatch = maxParamsCount / columnCount;
+ int maxBatches = rowCount / maxRowsInBatch;
+ int last = rowCount % maxRowsInBatch;
+ String query = "";
+ if (maxBatches > 0) {
+ query = dbType.getBatchInsertQuery(tableName, columns, rowFormat,
maxRowsInBatch);
+ }
+ int batchParamCount = maxRowsInBatch * columnCount;
+ for (int batch = 0; batch < maxBatches; batch++) {
+ bec.execute(query, maxRowsInBatch, batchParamCount);
+ }
+ if (last != 0) {
+ query = dbType.getBatchInsertQuery(tableName, columns, rowFormat, last);
+ bec.execute(query, last, last * columnCount);
+ }
+ }
+
+ private void insertSerdeInBatch(Map<Long, MSerDeInfo> serdeIdToSerDeInfo)
throws MetaException {
+ int rowCount = serdeIdToSerDeInfo.size();
+ String columns =
"(\"SERDE_ID\",\"DESCRIPTION\",\"DESERIALIZER_CLASS\",\"NAME\",\"SERDE_TYPE\",\"SLIB\","
+ + "\"SERIALIZER_CLASS\")";
+ String row = "(?,?,?,?,?,?,?)";
+ int columnCount = 7;
+ BatchExecutionContext bec = new BatchExecutionContext() {
Review Comment:
There has already a class named Batchable for the same purpose:
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Batchable.java
##########
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java:
##########
@@ -753,6 +755,12 @@ public enum ConfVars {
"SQL. For some DBs like Oracle and MSSQL, there are hardcoded or
perf-based limitations\n" +
"that necessitate this. For DBs that can handle the queries, this
isn't necessary and\n" +
"may impede performance. -1 means no batching, 0 means automatic
batching."),
+
DIRECT_SQL_MAX_PARAMS_IN_INSERT("metastore.direct.sql.max.parameters.in.insert",
Review Comment:
you can reuse the property: `metastore.direct.sql.batch.size`
##########
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java:
##########
@@ -692,6 +692,8 @@ public enum ConfVars {
"Default transaction isolation level for identity generation."),
DATANUCLEUS_USE_LEGACY_VALUE_STRATEGY("datanucleus.rdbms.useLegacyNativeValueStrategy",
"datanucleus.rdbms.useLegacyNativeValueStrategy", true, ""),
+ DATANUCLEUS_QUERY_SQL_ALLOWALL("datanucleus.query.sql.allowAll",
"datanucleus.query.sql.allowAll",
Review Comment:
Seems we have no where to use this property.
Issue Time Tracking
-------------------
Worklog Id: (was: 838278)
Time Spent: 1.5h (was: 1h 20m)
> Explore moving to directsql for ObjectStore::addPartitions
> ----------------------------------------------------------
>
> Key: HIVE-26035
> URL: https://issues.apache.org/jira/browse/HIVE-26035
> Project: Hive
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> Currently {{addPartitions}} uses datanuclues and is super slow for large
> number of partitions. It will be good to move to direct sql. Lots of repeated
> SQLs can be avoided as well (e.g SDS, SERDE, TABLE_PARAMS)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)