[ https://issues.apache.org/jira/browse/NIFI-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136145#comment-17136145 ]
ASF subversion and git services commented on NIFI-7527: ------------------------------------------------------- Commit e02ffdd99fb3e0f561a50903f491b392a1a505cc in nifi's branch refs/heads/master from Tamas Palfy [ https://gitbox.apache.org/repos/asf?p=nifi.git;h=e02ffdd ] NIFI-7527 AbstractKuduProcessorrefresh TGT deadlock fix: Redesigned locking. NIFI-7527 Fixed StackOverFlowError due to pacing issue (recursive login before loggedIn flag is set). NIFI-7527 Refactor: removed redundant kudu client creation. This closes #4330. Signed-off-by: Peter Turcsanyi <turcsa...@apache.org> > AbstractKuduProcessor deadlocks after TGT refresh > -------------------------------------------------- > > Key: NIFI-7527 > URL: https://issues.apache.org/jira/browse/NIFI-7527 > Project: Apache NiFi > Issue Type: Bug > Reporter: Tamas Palfy > Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > The fix for https://issues.apache.org/jira/browse/NIFI-7453 (PutKudu kerberos > issue after TGT expires) introduced a new bug: after TGT refresh the > processor ends up in a deadlock. > The reason is that the onTrigger initiates a read lock: > {code:java} > @Override > public void onTrigger(final ProcessContext context, final ProcessSession > session) throws ProcessException { > kuduClientReadLock.lock(); > try { > onTrigger(context, session, kuduClientR); > } finally { > kuduClientReadLock.unlock(); > } > } > {code} > and while the read lock is in effect, later (in the same stack) - if TGT > refresh occurs - a write lock is attempted: > {code:java} > ... > public synchronized boolean checkTGTAndRelogin() throws > LoginException { > boolean didRelogin = super.checkTGTAndRelogin(); > if (didRelogin) { > createKuduClient(context); > } > return didRelogin; > } > ... > protected void createKuduClient(ProcessContext context) { > kuduClientWriteLock.lock(); > try { > if (this.kuduClientR.get() != null) { > try { > this.kuduClientR.get().close(); > } catch (KuduException e) { > getLogger().error("Couldn't close Kudu client."); > } > } > if (kerberosUser != null) { > final KerberosAction<KuduClient> kerberosAction = new > KerberosAction<>(kerberosUser, () -> buildClient(context), getLogger()); > this.kuduClientR.set(kerberosAction.execute()); > } else { > this.kuduClientR.set(buildClient(context)); > } > } finally { > kuduClientWriteLock.unlock(); > } > } > {code} > This attempt at the write lock will get stuck, waiting for the previous read > lock to get released. > (Other threads may have acquired the same read lock but they can release it > eventually - unless they too try to acquire the write lock themselves.) > For the fix it seemed to be best to re-evalute the locking logic. > Previously basically the whole onTrigger logic was encapsulated in a read > lock, including the checking - and recreating as needed - the Kudu client > (as explained before). > It's best to just keep the actual privileged action in the read lock so the > the refreshing of the TGT and re-creation of the Kudu client can safely be > done in a write lock before that. -- This message was sent by Atlassian Jira (v8.3.4#803005)