[ https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138741#comment-16138741 ]
anishek commented on HIVE-16886: -------------------------------- auto increments can have holes if * a transaction was aborted. * or the sequence generation is happening in code by explicitly calling the next_val on the sequence behind the auto increment and then doing the insert from the application in which case the problem mentioned by [~spena] with GC in app can cause holes or unordered inserts. holes can happen and are not guaranteed because of the above cases, however we should not care about the case 1 of aborted transactions until we get increasing ordered unique numbers. as for the second case if we use the auto_increment at DB and not use datanuclues datastore-identity with auto-increment, we should be able to get them in order. the test provided by [~spena] works on mysql with both positive and negative mapping. negative mapping -- existing mapping with mysql. positive mapping -- auto increment as part of create table for notification log, on NL_ID, on a fresh db. for the patch rather than using the existing columns i think i will create another column that will be auto increment at db level. I will try the fix on a postgres sql db also to see if there is separate behavior, however if the race condition happens at the db level such that two transactions are committed from the application (HMS) at the same time but the db will order them depending on which acquires the lock to auto_increment first. how ever since replication is not realtime this lag of a few nano/micro secs should not be a problem. retrying the whole metastore operation with optimistic locking in application code is just calling for a lot of retries on HMS side with the possibility of retrying complete metastore operations to be redone if something fails when one commit is larger than other commits. additionally this will need us to do perfect distributed transactions on rdbms + hdfs. > HMS log notifications may have duplicated event IDs if multiple HMS are > running concurrently > -------------------------------------------------------------------------------------------- > > Key: HIVE-16886 > URL: https://issues.apache.org/jira/browse/HIVE-16886 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore > Reporter: Sergio Peña > Assignee: anishek > Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch > > > When running multiple Hive Metastore servers and DB notifications are > enabled, I could see that notifications can be persisted with a duplicated > event ID. > This does not happen when running multiple threads in a single HMS node due > to the locking acquired on the DbNotificationsLog class, but multiple HMS > could cause conflicts. > The issue is in the ObjectStore#addNotificationEvent() method. The event ID > fetched from the datastore is used for the new notification, incremented in > the server itself, then persisted or updated back to the datastore. If 2 > servers read the same ID, then these 2 servers write a new notification with > the same ID. > The event ID is not unique nor a primary key. > Here's a test case using the TestObjectStore class that confirms this issue: > {noformat} > @Test > public void testConcurrentAddNotifications() throws ExecutionException, > InterruptedException { > final int NUM_THREADS = 2; > CountDownLatch countIn = new CountDownLatch(NUM_THREADS); > CountDownLatch countOut = new CountDownLatch(1); > HiveConf conf = new HiveConf(); > conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, > MockPartitionExpressionProxy.class.getName()); > ExecutorService executorService = > Executors.newFixedThreadPool(NUM_THREADS); > FutureTask<Void> tasks[] = new FutureTask[NUM_THREADS]; > for (int i=0; i<NUM_THREADS; i++) { > final int n = i; > tasks[i] = new FutureTask<Void>(new Callable<Void>() { > @Override > public Void call() throws Exception { > ObjectStore store = new ObjectStore(); > store.setConf(conf); > NotificationEvent dbEvent = > new NotificationEvent(0, 0, > EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n); > System.out.println("ADDING NOTIFICATION"); > countIn.countDown(); > countOut.await(); > store.addNotificationEvent(dbEvent); > System.out.println("FINISH NOTIFICATION"); > return null; > } > }); > executorService.execute(tasks[i]); > } > countIn.await(); > countOut.countDown(); > for (int i = 0; i < NUM_THREADS; ++i) { > tasks[i].get(); > } > NotificationEventResponse eventResponse = > objectStore.getNextNotification(new NotificationEventRequest()); > Assert.assertEquals(2, eventResponse.getEventsSize()); > Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId()); > // This fails because the next notification has an event ID = 1 > Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId()); > } > {noformat} > The last assertion fails expecting an event ID 1 instead of 2. -- This message was sent by Atlassian JIRA (v6.4.14#64029)