[jira] [Updated] (HADOOP-12699) TestKMS#testKMSProvider intermittently fails during 'test rollover draining'

Xiao Chen (JIRA) Sun, 17 Jan 2016 18:33:06 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-12699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Xiao Chen updated HADOOP-12699:
-------------------------------
    Attachment: HADOOP-12699.06.patch

Thanks [~asuresh] for the comment and confirming on the fix direction!

I found out what was missed in my previous mentioned approach: I was just 
removing the runnables from executor, but did not handle the already started 
runnables right.

bq. in the drain call, do an executors.shutdownnow(), clear all the queues and 
restart the executor threads.
My thought over this is that shutting down all tasks may be an overkill - we 
only need to cancel the tasks for the key that's being rolled. Also, going this 
way we need to handle the {{executorThreadsStarted}} race between 
{{submitRefillTask}} and restart. How do you think about my approach below?

Attached patch 6, locally ran the reproduce snippet 10k time and no failure.
- As discussed, focusing on fixing the {{ValueQueue}}, instead of hacking it 
for the test.
- On {{ValueQueue#drain}}, remove all {{NamedRunnable}} from the 
{{ThreadPoolExecutor}}'s queue for that specific key being drained. Updated 
{{keysInProgress}} data structure in {{NamedRunnable}} from set to map to 
achieve this.
- Also need to cancel the {{NamedRunnable}} that's already being executed.
-* Since we're not submitting to the executor, we can't get a {{Future}} to 
cancel. I added an {{AtomicBoolean}} to flag whether a {{NamedRunnable}} is 
canceled, and manually cancel job / clear wrongly-filled queue according to 
this flag. 
-* We need to handle the race on {{keyQueue}} between drain and get. Added 
{{ReadWriteLock}} for this purpose. Each key uses a separate lock, so that 
operations on different keys don't impact each other.
-* Since the {{keyQueue}} are loaded by {{LoadingCache}}, I didn't find a 
perfect place to initialize the locks for {{keyQueue}}. Lazy initialization is 
used to create the locks.

I understand this patch adds locking and may impact performance, but I think 
this is the trade off for supporting same client to roll and get (and I feel we 
should support it). I've been trying to keep the impact (locking/runnable 
canceling) scope under each key, which is the same way roll and get works 
currently.
Please review and share your thoughts. Thank you!

> TestKMS#testKMSProvider intermittently fails during 'test rollover draining'
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-12699
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12699
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Xiao Chen
>            Assignee: Xiao Chen
>         Attachments: HADOOP-12699.01.patch, HADOOP-12699.02.patch, 
> HADOOP-12699.03.patch, HADOOP-12699.04.patch, HADOOP-12699.06.patch, 
> HADOOP-12699.repro.2, HADOOP-12699.repro.patch
>
>
> I've seen several failures of testKMSProvider, all failed in the following 
> snippet:
> {code}
>         // test rollover draining
>         KeyProviderCryptoExtension kpce = KeyProviderCryptoExtension.
>             createKeyProviderCryptoExtension(kp);
>         .....
>         EncryptedKeyVersion ekv1 = kpce.generateEncryptedKey("k6");
>         kpce.rollNewVersion("k6");
>         EncryptedKeyVersion ekv2 = kpce.generateEncryptedKey("k6");
>         Assert.assertNotEquals(ekv1.getEncryptionKeyVersionName(),
>             ekv2.getEncryptionKeyVersionName());
> {code}
> with error message
> {quote}Values should be different. Actual: k6@0{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-12699) TestKMS#testKMSProvider intermittently fails during 'test rollover draining'

Reply via email to