[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138741#comment-16138741
 ] 

anishek commented on HIVE-16886:
--------------------------------

auto increments can have holes if

* a transaction was aborted.
* or the sequence generation is happening in code by explicitly calling the 
next_val on the sequence behind the auto increment and then doing the insert 
from the application in which case the problem mentioned by [~spena] with GC in 
app can cause holes or unordered inserts.

holes can happen and are not guaranteed because of the above cases, however we 
should not care about the case 1 of aborted transactions until we get 
increasing ordered unique numbers. 

as for the second case if we use the auto_increment at DB and not use 
datanuclues datastore-identity with auto-increment, we should be able to get 
them in order. 

the test provided by  [~spena]  works on mysql with both positive and negative 
mapping. 

negative mapping -- existing mapping with mysql.
positive mapping -- auto increment as part of create table for notification 
log, on NL_ID, on a fresh db. 

for the patch rather than using the existing columns i think i will create 
another column that will be auto increment at db level.  I will try the fix on 
a postgres sql db also to see if there is separate behavior, however if the 
race condition happens at the db level such that two transactions are committed 
from the application (HMS) at the same time but the db will order them 
depending on which acquires the lock to auto_increment first. how ever since 
replication is not realtime this lag of a few nano/micro secs should not be a 
problem. 

retrying the whole metastore operation with optimistic locking in application 
code is just calling for a lot of retries on HMS side with the possibility of 
retrying complete metastore operations to be redone if something fails when one 
commit is larger than other commits. additionally this will need us to do 
perfect distributed transactions on rdbms + hdfs. 

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> --------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16886
>                 URL: https://issues.apache.org/jira/browse/HIVE-16886
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Metastore
>            Reporter: Sergio Peña
>            Assignee: anishek
>         Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
>     final int NUM_THREADS = 2;
>     CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
>     CountDownLatch countOut = new CountDownLatch(1);
>     HiveConf conf = new HiveConf();
>     conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
>     ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
>     FutureTask<Void> tasks[] = new FutureTask[NUM_THREADS];
>     for (int i=0; i<NUM_THREADS; i++) {
>       final int n = i;
>       tasks[i] = new FutureTask<Void>(new Callable<Void>() {
>         @Override
>         public Void call() throws Exception {
>           ObjectStore store = new ObjectStore();
>           store.setConf(conf);
>           NotificationEvent dbEvent =
>               new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>           System.out.println("ADDING NOTIFICATION");
>           countIn.countDown();
>           countOut.await();
>           store.addNotificationEvent(dbEvent);
>           System.out.println("FINISH NOTIFICATION");
>           return null;
>         }
>       });
>       executorService.execute(tasks[i]);
>     }
>     countIn.await();
>     countOut.countDown();
>     for (int i = 0; i < NUM_THREADS; ++i) {
>       tasks[i].get();
>     }
>     NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
>     Assert.assertEquals(2, eventResponse.getEventsSize());
>     Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
>     // This fails because the next notification has an event ID = 1
>     Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to