[ 
https://issues.apache.org/jira/browse/NIFI-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tamas Palfy updated NIFI-7527:
------------------------------
    Description: 
The fix for https://issues.apache.org/jira/browse/NIFI-7453 (PutKudu kerberos 
issue after TGT expires) introduced a new bug: after TGT refresh the processor 
ends up in a deadlock.

The reason is that the onTrigger initiates a read lock:
{code:java}
    @Override
    public void onTrigger(final ProcessContext context, final ProcessSession 
session) throws ProcessException {
        kuduClientReadLock.lock();
        try {
            onTrigger(context, session, kuduClientR);
        } finally {
            kuduClientReadLock.unlock();
        }
    }
{code}
and while the read lock is in effect, later (in the same stack) - if TGT 
refresh occurs - a write lock is attempted:
{code:java}
...
            public synchronized boolean checkTGTAndRelogin() throws 
LoginException {
                boolean didRelogin = super.checkTGTAndRelogin();

                if (didRelogin) {
                    createKuduClient(context);
                }

                return didRelogin;
            }
...

    protected void createKuduClient(ProcessContext context) {
        kuduClientWriteLock.lock();
        try {
            if (this.kuduClientR.get() != null) {
                try {
                    this.kuduClientR.get().close();
                } catch (KuduException e) {
                    getLogger().error("Couldn't close Kudu client.");
                }
            }

            if (kerberosUser != null) {
                final KerberosAction<KuduClient> kerberosAction = new 
KerberosAction<>(kerberosUser, () -> buildClient(context), getLogger());
                this.kuduClientR.set(kerberosAction.execute());
            } else {
                this.kuduClientR.set(buildClient(context));
            }
        } finally {
            kuduClientWriteLock.unlock();
        }
    }
{code}
This write lock will stuck waiting to the previous read lock to get release.
(Other threads may have acquired the same readlock but they can release it 
eventually - unless they too try to acquire the write lock themselves.)

The fix should be fairly simple: need to release the read lock before trying to 
acquire the write lock.
Need to take care of the following though:
* Need to reacquire the read lock after the new Kudu client has been created 
because the current logic calls an unlock on the ReadLock object later in a 
finally block. As this is a reentrant lock, that means decreasing the counter 
for the number of locks. We need to make sure the that number of locks and 
unlocks match.
* Need to make sure whenever we reference the Kudu client, it's actually the 
latest one. volatile no longer suffice, we need to wrap it in an 
AtomicReference.

  was:
The fix for https://issues.apache.org/jira/browse/NIFI-7453 (PutKudu kerberos 
issue after TGT expires) introduced a new bug: after TGT refresh the processor 
ends up in a deadlock.

The reason is that the onTrigger initiates a read lock:
{code:java}
    @Override
    public void onTrigger(final ProcessContext context, final ProcessSession 
session) throws ProcessException {
        kuduClientReadLock.lock();
        try {
            onTrigger(context, session, kuduClientR);
        } finally {
            kuduClientReadLock.unlock();
        }
    }
{code}
and while the read lock has been acquired, later (in the same stack) - if TGT 
refresh occurs - a write lock is attempted:
{code:java}
...
            public synchronized boolean checkTGTAndRelogin() throws 
LoginException {
                boolean didRelogin = super.checkTGTAndRelogin();

                if (didRelogin) {
                    createKuduClient(context);
                }

                return didRelogin;
            }
...

    protected void createKuduClient(ProcessContext context) {
        kuduClientWriteLock.lock();
        try {
            if (this.kuduClientR.get() != null) {
                try {
                    this.kuduClientR.get().close();
                } catch (KuduException e) {
                    getLogger().error("Couldn't close Kudu client.");
                }
            }

            if (kerberosUser != null) {
                final KerberosAction<KuduClient> kerberosAction = new 
KerberosAction<>(kerberosUser, () -> buildClient(context), getLogger());
                this.kuduClientR.set(kerberosAction.execute());
            } else {
                this.kuduClientR.set(buildClient(context));
            }
        } finally {
            kuduClientWriteLock.unlock();
        }
    }
{code}
This write lock will stuck waiting to the previous read lock to get release.
(Other threads may have acquired the same readlock but they can release it 
eventually - unless they too try to acquire the write lock themselves.)

The fix should be fairly simple: need to release the read lock before trying to 
acquire the write lock.
Need to take care of the following though:
* Need to reacquire the read lock after the new Kudu client has been created 
because the current logic calls an unlock on the ReadLock object later in a 
finally block. As this is a reentrant lock, that means decreasing the counter 
for the number of locks. We need to make sure the that number of locks and 
unlocks match.
* Need to make sure whenever we reference the Kudu client, it's actually the 
latest one. volatile no longer suffice, we need to wrap it in an 
AtomicReference.


> AbstractKuduProcessor deadlocks after TGT refresh 
> --------------------------------------------------
>
>                 Key: NIFI-7527
>                 URL: https://issues.apache.org/jira/browse/NIFI-7527
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Tamas Palfy
>            Priority: Major
>
> The fix for https://issues.apache.org/jira/browse/NIFI-7453 (PutKudu kerberos 
> issue after TGT expires) introduced a new bug: after TGT refresh the 
> processor ends up in a deadlock.
> The reason is that the onTrigger initiates a read lock:
> {code:java}
>     @Override
>     public void onTrigger(final ProcessContext context, final ProcessSession 
> session) throws ProcessException {
>         kuduClientReadLock.lock();
>         try {
>             onTrigger(context, session, kuduClientR);
>         } finally {
>             kuduClientReadLock.unlock();
>         }
>     }
> {code}
> and while the read lock is in effect, later (in the same stack) - if TGT 
> refresh occurs - a write lock is attempted:
> {code:java}
> ...
>             public synchronized boolean checkTGTAndRelogin() throws 
> LoginException {
>                 boolean didRelogin = super.checkTGTAndRelogin();
>                 if (didRelogin) {
>                     createKuduClient(context);
>                 }
>                 return didRelogin;
>             }
> ...
>     protected void createKuduClient(ProcessContext context) {
>         kuduClientWriteLock.lock();
>         try {
>             if (this.kuduClientR.get() != null) {
>                 try {
>                     this.kuduClientR.get().close();
>                 } catch (KuduException e) {
>                     getLogger().error("Couldn't close Kudu client.");
>                 }
>             }
>             if (kerberosUser != null) {
>                 final KerberosAction<KuduClient> kerberosAction = new 
> KerberosAction<>(kerberosUser, () -> buildClient(context), getLogger());
>                 this.kuduClientR.set(kerberosAction.execute());
>             } else {
>                 this.kuduClientR.set(buildClient(context));
>             }
>         } finally {
>             kuduClientWriteLock.unlock();
>         }
>     }
> {code}
> This write lock will stuck waiting to the previous read lock to get release.
> (Other threads may have acquired the same readlock but they can release it 
> eventually - unless they too try to acquire the write lock themselves.)
> The fix should be fairly simple: need to release the read lock before trying 
> to acquire the write lock.
> Need to take care of the following though:
> * Need to reacquire the read lock after the new Kudu client has been created 
> because the current logic calls an unlock on the ReadLock object later in a 
> finally block. As this is a reentrant lock, that means decreasing the counter 
> for the number of locks. We need to make sure the that number of locks and 
> unlocks match.
> * Need to make sure whenever we reference the Kudu client, it's actually the 
> latest one. volatile no longer suffice, we need to wrap it in an 
> AtomicReference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to