keith-turner commented on issue #992: Notifications never processed while 
running stress test
URL: https://github.com/apache/fluo/issues/992#issuecomment-355066495
 
 
   I was able to confirm that NotificationTracker.requeue() not being 
synchronized while accessing a hashmap caused the problem.  
   
   I suspected when `contains()` is called on the hashmap outside of sync that 
it could return false even though the map does contain it.  This could happen 
if another thread is rehashing the map.  I wrote a stand alone test with two 
threads and a hashmap to confirm this.  One thread would constantly call 
`contains()` for a key known to be in the map while another thread was 
constantly inserting data.  Sometimes the `contains()` call would return false 
even though the map contained the key.
   
   I patched the code in the following way and ran the stress test until the 
bug happened again.
   
   ```patch
   diff --git 
a/modules/core/src/main/java/org/apache/fluo/core/worker/NotificationProcessor.java
 
b/modules/core/src/main/java/org/apache/fluo/core/worker/NotificationProcessor.java
   index 9466933..29c42d6 100644
   --- 
a/modules/core/src/main/java/org/apache/fluo/core/worker/NotificationProcessor.java
   +++ 
b/modules/core/src/main/java/org/apache/fluo/core/worker/NotificationProcessor.java
   @@ -126,6 +126,7 @@ public class NotificationProcessor implements 
AutoCloseable {
    
        public boolean requeue(RowColumn rowCol, FutureTask<?> ft) {
          if (!queuedWork.containsKey(rowCol)) {
   +        log.debug("queuedWork did not contain " + rowCol + " not 
requeuing");
            return false;
          }
   ```
   
   I enabled debug logging and the ran the stress test over and over until the 
bug happened again.  After it got stuck I saw the following the logs.
   
   ```
   $ grep NotificationProcessor *.log
   worker1.log:2018-01-03 15:44:37,835 [worker.NotificationProcessor] DEBUG: 
queuedWork did not contain 07:5d43:08:000000000cc4be00 count wait  not requeuing
   worker2.log:2018-01-03 15:44:05,432 [worker.NotificationProcessor] DEBUG: 
queuedWork did not contain 07:flrf:08:0000000024477200 count wait  not requeuing
   ```
   
   The two notifications above are present in the table, but never being 
processing by the workers.
   
   ```
   $ accumulo shell -u root -p secret -e 'compact -t stresso -w'
   $ fluo scan -a stresso --raw -c ntfy
   07:5d43:08:000000000cc4be00 ntfy:count:wait [] 822968-INSERT 
   07:flrf:08:0000000024477200 ntfy:count:wait [] 8796731-INSERT        
   ```
   
   I confirmed these notifications were present in the workers processes 
`queuedWork` hashmaps by taking heap dumps.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to