[ https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137642#comment-16137642 ]
Hive QA commented on HIVE-16886: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12883195/datastore-identity-holes.diff {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 33 failed/errored test(s), 10860 tests executed *Failed tests:* {noformat} TestCommands - did not produce a TEST-*.xml file (likely timed out) (batchId=180) TestDbNotificationListener - did not produce a TEST-*.xml file (likely timed out) (batchId=233) TestEximReplicationTasks - did not produce a TEST-*.xml file (likely timed out) (batchId=180) TestExport - did not produce a TEST-*.xml file (likely timed out) (batchId=218) TestHCatClient - did not produce a TEST-*.xml file (likely timed out) (batchId=180) TestHCatClientNotification - did not produce a TEST-*.xml file (likely timed out) (batchId=233) TestHCatHiveCompatibility - did not produce a TEST-*.xml file (likely timed out) (batchId=233) TestHCatHiveThriftCompatibility - did not produce a TEST-*.xml file (likely timed out) (batchId=233) TestLocationQueries - did not produce a TEST-*.xml file (likely timed out) (batchId=218) TestNoopCommand - did not produce a TEST-*.xml file (likely timed out) (batchId=180) TestObjectStore - did not produce a TEST-*.xml file (likely timed out) (batchId=201) TestReplicationScenarios - did not produce a TEST-*.xml file (likely timed out) (batchId=218) TestReplicationScenariosAcrossInstances - did not produce a TEST-*.xml file (likely timed out) (batchId=218) TestReplicationTask - did not produce a TEST-*.xml file (likely timed out) (batchId=180) TestRetriesInRetryingHMSHandler - did not produce a TEST-*.xml file (likely timed out) (batchId=201) TestSemanticAnalyzerHookLoading - did not produce a TEST-*.xml file (likely timed out) (batchId=218) TestSequenceFileReadWrite - did not produce a TEST-*.xml file (likely timed out) (batchId=233) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[repl_dump_requires_admin] (batchId=90) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[repl_load_requires_admin] (batchId=90) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistCpAs (batchId=250) org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistcp (batchId=250) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testReplDumpResultSet (batchId=228) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testConnection (batchId=241) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValid (batchId=241) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValidNeg (batchId=241) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeProxyAuth (batchId=241) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeTokenAuth (batchId=241) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testProxyAuth (batchId=241) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testTokenAuth (batchId=241) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6492/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6492/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6492/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 33 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12883195 - PreCommit-HIVE-Build > HMS log notifications may have duplicated event IDs if multiple HMS are > running concurrently > -------------------------------------------------------------------------------------------- > > Key: HIVE-16886 > URL: https://issues.apache.org/jira/browse/HIVE-16886 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore > Reporter: Sergio Peña > Assignee: anishek > Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch > > > When running multiple Hive Metastore servers and DB notifications are > enabled, I could see that notifications can be persisted with a duplicated > event ID. > This does not happen when running multiple threads in a single HMS node due > to the locking acquired on the DbNotificationsLog class, but multiple HMS > could cause conflicts. > The issue is in the ObjectStore#addNotificationEvent() method. The event ID > fetched from the datastore is used for the new notification, incremented in > the server itself, then persisted or updated back to the datastore. If 2 > servers read the same ID, then these 2 servers write a new notification with > the same ID. > The event ID is not unique nor a primary key. > Here's a test case using the TestObjectStore class that confirms this issue: > {noformat} > @Test > public void testConcurrentAddNotifications() throws ExecutionException, > InterruptedException { > final int NUM_THREADS = 2; > CountDownLatch countIn = new CountDownLatch(NUM_THREADS); > CountDownLatch countOut = new CountDownLatch(1); > HiveConf conf = new HiveConf(); > conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, > MockPartitionExpressionProxy.class.getName()); > ExecutorService executorService = > Executors.newFixedThreadPool(NUM_THREADS); > FutureTask<Void> tasks[] = new FutureTask[NUM_THREADS]; > for (int i=0; i<NUM_THREADS; i++) { > final int n = i; > tasks[i] = new FutureTask<Void>(new Callable<Void>() { > @Override > public Void call() throws Exception { > ObjectStore store = new ObjectStore(); > store.setConf(conf); > NotificationEvent dbEvent = > new NotificationEvent(0, 0, > EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n); > System.out.println("ADDING NOTIFICATION"); > countIn.countDown(); > countOut.await(); > store.addNotificationEvent(dbEvent); > System.out.println("FINISH NOTIFICATION"); > return null; > } > }); > executorService.execute(tasks[i]); > } > countIn.await(); > countOut.countDown(); > for (int i = 0; i < NUM_THREADS; ++i) { > tasks[i].get(); > } > NotificationEventResponse eventResponse = > objectStore.getNextNotification(new NotificationEventRequest()); > Assert.assertEquals(2, eventResponse.getEventsSize()); > Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId()); > // This fails because the next notification has an event ID = 1 > Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId()); > } > {noformat} > The last assertion fails expecting an event ID 1 instead of 2. -- This message was sent by Atlassian JIRA (v6.4.14#64029)