[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

ASF GitHub Bot (Jira) Sun, 21 Jun 2020 17:14:49 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=448964&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-448964
 ]


ASF GitHub Bot logged work on HIVE-23725:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Jun/20 00:13
            Start Date: 22/Jun/20 00:13
    Worklog Time Spent: 10m 
      Work Description: jcamachor commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r443269785



##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -4979,10 +4979,11 @@ private static void 
populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
 
     HIVE_QUERY_REEXECUTION_ENABLED("hive.query.reexecution.enabled", true,
         "Enable query reexecutions"),
-    HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies", 
"overlay,reoptimize",
+    HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies", 
"overlay,reoptimize,retrylock",

Review comment:
       Do we really want `retrylock` to be driven by a config? Shouldn't it 
just be enabled?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##########
@@ -675,50 +678,18 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
       try {
         if (!validTxnManager.isValidTxnListState()) {
-          LOG.info("Compiling after acquiring locks");
+          LOG.info("Reexecuting after acquiring locks, since snapshot was 
outdated.");
           // Snapshot was outdated when locks were acquired, hence regenerate 
context,
-          // txn list and retry
-          // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
-          // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
-          // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
-          // But if snapshot is not valid, we recompile the query.
-          if (driverContext.isOutdatedTxn()) {
-            driverContext.getTxnManager().rollbackTxn();
-
-            String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
-            driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
-            lockAndRespond();
-          }
-          driverContext.setRetrial(true);
-          driverContext.getBackupContext().addSubContext(context);
-          
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
-          context = driverContext.getBackupContext();
-          driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
-            driverContext.getTxnManager().getValidTxns().toString());
-          if (driverContext.getPlan().hasAcidResourcesInQuery()) {
-            validTxnManager.recordValidWriteIds();
-          }
-
-          if (!alreadyCompiled) {
-            // compile internal will automatically reset the perf logger
-            compileInternal(command, true);
-          } else {
-            // Since we're reusing the compiled plan, we need to update its 
start time for current run
-            
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
-          }
-
-          if (!validTxnManager.isValidTxnListState()) {
-            // Throw exception
-            throw handleHiveException(new HiveException("Operation could not 
be executed"), 14);
+          // txn list and retry (see ReExecutionRetryLockPlugin)
+          try {
+            releaseLocksAndCommitOrRollback(false);
+          } catch (LockException e) {
+            handleHiveException(e, 12);
           }
-
-          //Reset the PerfLogger
-          perfLogger = SessionState.getPerfLogger(true);
-
-          // the reason that we set the txn manager for the cxt here is 
because each
-          // query has its own ctx object. The txn mgr is shared across the
-          // same instance of Driver, which can run multiple queries.
-          context.setHiveTxnManager(driverContext.getTxnManager());
+          throw handleHiveException(

Review comment:
       I think this makes behavior wrt original logic slightly different. For 
instance, if another transaction obtains the locks in between the moment that 
this transaction releases them and is going to acquire them again, does this 
mean the transaction would fail for a second time? If that is the case, should 
we have a specific configuration for the number of retries in this case? It 
seems for the default re-execution the number of retries is `1`, but in this 
case, we could retry several times before failing the query.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverFactory.java
##########
@@ -64,6 +65,9 @@ private static IReExecutionPlugin buildReExecPlugin(String 
name) throws RuntimeE
     if (name.equals("reoptimize")) {
       return new ReOptimizePlugin();
     }
+    if (name.endsWith("retrylock")) {

Review comment:
       `equals` ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 448964)
    Time Spent: 1h 10m  (was: 1h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> ------------------------------------------------------------------------
>
>                 Key: HIVE-23725
>                 URL: https://issues.apache.org/jira/browse/HIVE-23725
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Peter Varga
>            Assignee: Peter Varga
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

Reply via email to