----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/765/#review709 -----------------------------------------------------------
trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java <https://reviews.apache.org/r/765/#comment1417> can you add a comment on what situation this exception will be thrown? Just for the sake of reader that didn't notice there is a primary key constraint in the DDL. trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java <https://reviews.apache.org/r/765/#comment1418> remove this? - Ning On 2011-05-21 01:49:07, Tomasz Nykiel wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/765/ > ----------------------------------------------------------- > > (Updated 2011-05-21 01:49:07) > > > Review request for hive. > > > Summary > ------- > > Currently, the JDBCStatsPublisher executes two queries per inserted row of > statistics, first query to check if the ID was inserted by another task, and > second query to insert a new or update the existing row. > The latter occurs very rarely, since duplicates most likely originate from > speculative failed tasks. > > Currently the schema of the stat table is the following: > > PARTITION_STAT_TABLE ( ID VARCHAR(255), ROW_COUNT BIGINT ) and does not have > any integrity constraints declared. > > We amend it to: > > PARTITION_STAT_TABLE ( ID VARCHAR(255) PRIMARY KEY , ROW_COUNT BIGINT ). > > HIVE-2144 improves on performance by greedily performing the insertion > statement. > Then instead of executing two queries per row inserted, we can execute one > INSERT query. > In the case primary key constraint violation, we perform a single UPDATE > query. > The UPDATE query needs to check the condition, if the currently inserted > stats are "newer" then the ones already in the table. > > > This addresses bug HIVE-2144. > https://issues.apache.org/jira/browse/HIVE-2144 > > > Diffs > ----- > > > trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java > 1125468 > trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestStatsPublisher.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/765/diff > > > Testing > ------- > > TestStatsPublisher JUnit test: > - basic behaviour > - multiple updates > - cleanup of the statistics table after aggregation > > Standalone testing on the cluster. > - insert/analyze queries over non-partitioned/partitioned tables > > NOTE. For the correct behaviour, the primary_key index needs to be created, > or the PARTITION_STAT_TABLE table dropped - which triggers creation of the > table with the constraint declared. > > > Thanks, > > Tomasz > >