-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72528/#review220838
-----------------------------------------------------------



The patch looks great, but I have concers. See below.


ql/src/java/org/apache/hadoop/hive/ql/Driver.java
Lines 686 (patched)
<https://reviews.apache.org/r/72528/#comment309552>

    I have concerns here, but I am not sure if they are well founded or not. I 
think this will break what the outside world thinks of snapshot isolation. I 
might have a hypothetical client that inserts lots of data in a source table 
and sometimes issue a merge statement from the source to the target table. They 
have some requirement that the target table can not have partial data regarding 
some property. Example they inserting sales data, and the target table can not 
contain half the data of a day, it can either have all or none. So what the 
clients does, it will issue the inserts into the source table synchronously 
ordered by the date and when it gets to a next day it issue a merge statement 
asynchronously and continues to inserts the data for the next day 
synchronously. And it might think that it is save to do so, since the merge 
statement has a snapshot it will not see the data inserted afterwards. But with 
this change it will break.
    It might not be the best example, since how would the client know when the 
snapshot is actually captured. But I am not familiar enough with the ecosystem, 
does anything use the Hive by issuing the compile and run separately? Because 
there you could be sure before this change, that the compilation order also 
meant snapshot order. So summarized, I don't know what the outside world 
excepts of the snapshot isolation.


- Peter Varga


On May 19, 2020, 11:19 a.m., Denys Kuzmenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72528/
> -----------------------------------------------------------
> 
> (Updated May 19, 2020, 11:19 a.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez, Peter Varga, and Peter Vary.
> 
> 
> Bugs: HIVE-23503
>     https://issues.apache.org/jira/browse/HIVE-23503
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> ValidTxnManager doesn't consider txns opened and committed between snapshot 
> generation and locking when evaluating ValidTxnListState. This cause issues 
> like duplicate insert in case of concurrent merge insert & insert.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java e70c92eef4 
>   ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java a8c83fc504 
>   ql/src/java/org/apache/hadoop/hive/ql/ValidTxnManager.java 7d49c57dda 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 71afcbdc68 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 
> 0383881acc 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java 
> 600289f837 
>   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
> 8a15b7cc5d 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
>  65df9c2ba9 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
>  887d4303f4 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java
>  312936efa8 
>   storage-api/src/java/org/apache/hadoop/hive/common/ValidReadTxnList.java 
> b8ff03f9c4 
>   storage-api/src/java/org/apache/hadoop/hive/common/ValidTxnList.java 
> d4c3b09730 
> 
> 
> Diff: https://reviews.apache.org/r/72528/diff/1/
> 
> 
> Testing
> -------
> 
> DbTxnManager tests.
> 
> 
> Thanks,
> 
> Denys Kuzmenko
> 
>

Reply via email to