[
https://issues.apache.org/jira/browse/HIVE-26265?focusedWorklogId=782199&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782199
]
ASF GitHub Bot logged work on HIVE-26265:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 17/Jun/22 01:10
Start Date: 17/Jun/22 01:10
Worklog Time Spent: 10m
Work Description: cmunkey commented on code in PR #3365:
URL: https://github.com/apache/hive/pull/3365#discussion_r899674228
##########
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/AbortTxnHandler.java:
##########
@@ -39,6 +48,19 @@ public void handle(Context withinContext) throws Exception {
if (!ReplUtils.includeAcidTableInDump(withinContext.hiveConf)) {
return;
}
+
+ if (ReplUtils.filterTransactionOperations(withinContext.hiveConf)) {
+ String contextDbName =
StringUtils.normalizeIdentifier(withinContext.replScope.getDbName());
+ GetTxnWriteIdsRequest request = new
GetTxnWriteIdsRequest(eventMessage.getTxnId());
+ request.setDbName(contextDbName);
+ GetTxnWriteIdsResponse response =
withinContext.db.getMSC().getTxnWriteIds(request);
Review Comment:
To move this during compilation, there are 2 difficulties:
1. Would need to add a new field/status to AbortTxnEvent, this field would
indicate whether or not this aborted txn involved writeids. REPL DUMP could
filter based on this setting.
2. The AbortTxnEvent is logged via TxnHandler.abort_txn(), so
AbortTxnRequest would need to be modified to pass the writeid. AbortTxnRequest
is a Thrift object. OR, abort_txn() could do the same HMS lookup that is
currently done in AbortTxnHandler().
Since this metastore call is done during REPL DUMP, would it be ok to live
with this inefficiency and fix later with a more optimal implementation?
Issue Time Tracking
-------------------
Worklog Id: (was: 782199)
Time Spent: 40m (was: 0.5h)
> REPL DUMP should filter out OpenXacts and unneeded CommitXact/Abort.
> --------------------------------------------------------------------
>
> Key: HIVE-26265
> URL: https://issues.apache.org/jira/browse/HIVE-26265
> Project: Hive
> Issue Type: Improvement
> Components: HiveServer2
> Reporter: francis pang
> Assignee: francis pang
> Priority: Major
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> REPL DUMP is replication all OpenXacts, even when they are from other non
> replicated databases. This wastes space in the dump, and ends up opening
> unneeded transactions during REPL LOAD.
>
> Add a config property for replication that filters out OpenXact events during
> REPL DUMP. During REPL LOAD, the txns can be implicitly opened when the
> ALLOC_WRITE_ID is processed. For CommitTxn and AbortTxn, dump only if WRITE
> ID was allocated.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)