[
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843823#comment-16843823
]
Emmanuel Lecharny commented on DIRMINA-1107:
--------------------------------------------
Guus,
I have checked Jonathan's patch on my machine (Mac OSX with Java 1.8.0_192,
it works like a charm.
Would you give it a try ?
Thanks !
{noformat}
$ git clone --single-branch http://gitbox.apache.org/repos/asf/mina.git -b
DIRMINA-1107 mina-DIRMINA-1107
$ cd mina-DIRMINA-1107/
$ git status
On branch DIRMINA-1107
Your branch is up to date with 'origin/DIRMINA-1107'.
$ mvn clean install -Pserial
...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Apache MINA 2.1.3-SNAPSHOT:
[INFO]
[INFO] Apache MINA ........................................ SUCCESS [ 1.515 s]
[INFO] Apache MINA Legal .................................. SUCCESS [ 1.327 s]
[INFO] Apache MINA Core ................................... SUCCESS [02:00 min]
[INFO] Apache MINA APR Transport .......................... SUCCESS [ 0.453 s]
[INFO] Apache MINA Compression Filter ..................... SUCCESS [ 1.049 s]
[INFO] Apache MINA State Machine .......................... SUCCESS [ 1.336 s]
[INFO] Apache MINA JavaBeans Integration .................. SUCCESS [ 1.010 s]
[INFO] Apache MINA XBean Integration ...................... SUCCESS [ 1.692 s]
[INFO] Apache MINA OGNL Integration ....................... SUCCESS [ 0.281 s]
[INFO] Apache MINA JMX Integration ........................ SUCCESS [ 0.290 s]
[INFO] Apache MINA Examples ............................... SUCCESS [ 3.038 s]
[INFO] Apache MINA HTTP client and server codec ........... SUCCESS [ 0.951 s]
[INFO] Apache MINA Serial Communication support ........... SUCCESS [ 0.305 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:14 min
[INFO] Finished at: 2019-05-20T11:46:13+02:00
[INFO] ------------------------------------------------------------------------
{noformat}
> SslHandler flushScheduledEvents race condition, redux
> -----------------------------------------------------
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
> Issue Type: Bug
> Affects Versions: 2.1.2
> Reporter: Guus der Kinderen
> Priority: Major
> Fix For: 2.1.3
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
>
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a
> negative value, while at the same time {{filterWriteEventQueue}} go
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the
> first thread decrementing the counter after a second thread incremented it,
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the
> counter to zero and release the lock, after which the second thread acquires
> the lock successfully. Now, the second thread processes any elements in
> {{filterWriteEventQueue}}, and then processes any elements in
> {{messageReceivedEventQueue}}. If in between these two checks yet another
> thread adds a new element to {{filterWriteEventQueue}}, this element can go
> unprocessed (as the second thread does not loop, since the counter is zero or
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed.
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
>
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() ||
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential
> problem. Theoretically, an event could be added to the queues and
> {{flushScheduledEvents}} be called returning {{false}} for
> {{sslLock.tryLock()}}, exactly after another thread just finished the
> {{while}} loop, but before releasing the lock. This again would cause events
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable
> applying this change as-is.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)