----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72528/#review220838 -----------------------------------------------------------
The patch looks great, but I have concers. See below. ql/src/java/org/apache/hadoop/hive/ql/Driver.java Lines 686 (patched) <https://reviews.apache.org/r/72528/#comment309552> I have concerns here, but I am not sure if they are well founded or not. I think this will break what the outside world thinks of snapshot isolation. I might have a hypothetical client that inserts lots of data in a source table and sometimes issue a merge statement from the source to the target table. They have some requirement that the target table can not have partial data regarding some property. Example they inserting sales data, and the target table can not contain half the data of a day, it can either have all or none. So what the clients does, it will issue the inserts into the source table synchronously ordered by the date and when it gets to a next day it issue a merge statement asynchronously and continues to inserts the data for the next day synchronously. And it might think that it is save to do so, since the merge statement has a snapshot it will not see the data inserted afterwards. But with this change it will break. It might not be the best example, since how would the client know when the snapshot is actually captured. But I am not familiar enough with the ecosystem, does anything use the Hive by issuing the compile and run separately? Because there you could be sure before this change, that the compilation order also meant snapshot order. So summarized, I don't know what the outside world excepts of the snapshot isolation. - Peter Varga On May 19, 2020, 11:19 a.m., Denys Kuzmenko wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72528/ > ----------------------------------------------------------- > > (Updated May 19, 2020, 11:19 a.m.) > > > Review request for hive, Jesús Camacho Rodríguez, Peter Varga, and Peter Vary. > > > Bugs: HIVE-23503 > https://issues.apache.org/jira/browse/HIVE-23503 > > > Repository: hive-git > > > Description > ------- > > ValidTxnManager doesn't consider txns opened and committed between snapshot > generation and locking when evaluating ValidTxnListState. This cause issues > like duplicate insert in case of concurrent merge insert & insert. > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/Driver.java e70c92eef4 > ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java a8c83fc504 > ql/src/java/org/apache/hadoop/hive/ql/ValidTxnManager.java 7d49c57dda > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 71afcbdc68 > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java > 0383881acc > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java > 600289f837 > ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java > 8a15b7cc5d > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java > 65df9c2ba9 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java > 887d4303f4 > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java > 312936efa8 > storage-api/src/java/org/apache/hadoop/hive/common/ValidReadTxnList.java > b8ff03f9c4 > storage-api/src/java/org/apache/hadoop/hive/common/ValidTxnList.java > d4c3b09730 > > > Diff: https://reviews.apache.org/r/72528/diff/1/ > > > Testing > ------- > > DbTxnManager tests. > > > Thanks, > > Denys Kuzmenko > >