[
https://issues.apache.org/jira/browse/HIVE-26035?focusedWorklogId=840810&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-840810
]
ASF GitHub Bot logged work on HIVE-26035:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 21/Jan/23 16:32
Start Date: 21/Jan/23 16:32
Worklog Time Spent: 10m
Work Description: VenuReddy2103 commented on code in PR #3905:
URL: https://github.com/apache/hive/pull/3905#discussion_r1083308365
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##########
@@ -515,6 +529,803 @@ public List<String>
getMaterializedViewsForRewriting(String dbName) throws MetaE
}
}
+ private Long getDataStoreId(Class<?> modelClass) throws MetaException {
+ ExecutionContext ec = ((JDOPersistenceManager) pm).getExecutionContext();
+ AbstractClassMetaData cmd =
ec.getMetaDataManager().getMetaDataForClass(modelClass,
ec.getClassLoaderResolver());
+ if (cmd.getIdentityType() == IdentityType.DATASTORE) {
+ return (Long) ec.getStoreManager().getValueGenerationStrategyValue(ec,
cmd, -1);
+ } else {
+ throw new MetaException("Identity type is not datastore.");
+ }
+ }
+
+ /**
+ * Interface to execute multiple row insert query in batch for direct SQL
+ */
+ interface BatchExecutionContext {
+ void execute(String batchQueryText, int batchRowCount, int
batchParamCount) throws MetaException;
+ }
+
+ private void insertInBatch(String tableName, String columns, int
columnCount, String rowFormat, int rowCount,
+ BatchExecutionContext bec) throws MetaException {
+ if (rowCount == 0 || columnCount == 0) {
+ return;
+ }
+ int maxParamsCount = maxParamsInInsert;
+ if (maxParamsCount < columnCount) {
+ LOG.error("Maximum number of parameters in the direct SQL batch insert
query is less than the table: {}"
+ + " columns. Executing single row insert queries.", tableName);
+ maxParamsCount = columnCount;
+ }
+ int maxRowsInBatch = maxParamsCount / columnCount;
+ int maxBatches = rowCount / maxRowsInBatch;
+ int last = rowCount % maxRowsInBatch;
+ String query = "";
+ if (maxBatches > 0) {
+ query = dbType.getBatchInsertQuery(tableName, columns, rowFormat,
maxRowsInBatch);
+ }
+ int batchParamCount = maxRowsInBatch * columnCount;
+ for (int batch = 0; batch < maxBatches; batch++) {
+ bec.execute(query, maxRowsInBatch, batchParamCount);
+ }
+ if (last != 0) {
+ query = dbType.getBatchInsertQuery(tableName, columns, rowFormat, last);
+ bec.execute(query, last, last * columnCount);
+ }
+ }
+
+ private void insertSerdeInBatch(Map<Long, MSerDeInfo> serdeIdToSerDeInfo)
throws MetaException {
+ int rowCount = serdeIdToSerDeInfo.size();
+ String columns =
"(\"SERDE_ID\",\"DESCRIPTION\",\"DESERIALIZER_CLASS\",\"NAME\",\"SERDE_TYPE\",\"SLIB\","
+ + "\"SERIALIZER_CLASS\")";
+ String row = "(?,?,?,?,?,?,?)";
+ int columnCount = 7;
+ BatchExecutionContext bec = new BatchExecutionContext() {
Review Comment:
Actually Batchable.runBatched() expects the input in the form of list. It is
being used when have input as list of partition names/ids or column names. But
in this case, objects to insert are not available as list. So defined a new
interface local to this new file scope.
Issue Time Tracking
-------------------
Worklog Id: (was: 840810)
Time Spent: 3.5h (was: 3h 20m)
> Explore moving to directsql for ObjectStore::addPartitions
> ----------------------------------------------------------
>
> Key: HIVE-26035
> URL: https://issues.apache.org/jira/browse/HIVE-26035
> Project: Hive
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Assignee: Venugopal Reddy K
> Priority: Major
> Labels: pull-request-available
> Time Spent: 3.5h
> Remaining Estimate: 0h
>
> Currently {{addPartitions}} uses datanuclues and is super slow for large
> number of partitions. It will be good to move to direct sql. Lots of repeated
> SQLs can be avoided as well (e.g SDS, SERDE, TABLE_PARAMS)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)