keith-turner commented on issue #992: Notifications never processed while running stress test URL: https://github.com/apache/fluo/issues/992#issuecomment-355066495 I was able to confirm that NotificationTracker.requeue() not being synchronized while accessing a hashmap caused the problem. I suspected when `contains()` is called on the hashmap outside of sync that it could return false even though the map does contain it. This could happen if another thread is rehashing the map. I wrote a stand alone test with two threads and a hashmap to confirm this. One thread would constantly call `contains()` for a key known to be in the map while another thread was constantly inserting data. Sometimes the `contains()` call would return false even though the map contained the key. I patched the code in the following way and ran the stress test until the bug happened again. ```patch diff --git a/modules/core/src/main/java/org/apache/fluo/core/worker/NotificationProcessor.java b/modules/core/src/main/java/org/apache/fluo/core/worker/NotificationProcessor.java index 9466933..29c42d6 100644 --- a/modules/core/src/main/java/org/apache/fluo/core/worker/NotificationProcessor.java +++ b/modules/core/src/main/java/org/apache/fluo/core/worker/NotificationProcessor.java @@ -126,6 +126,7 @@ public class NotificationProcessor implements AutoCloseable { public boolean requeue(RowColumn rowCol, FutureTask<?> ft) { if (!queuedWork.containsKey(rowCol)) { + log.debug("queuedWork did not contain " + rowCol + " not requeuing"); return false; } ``` I enabled debug logging and the ran the stress test over and over until the bug happened again. After it got stuck I saw the following the logs. ``` $ grep NotificationProcessor *.log worker1.log:2018-01-03 15:44:37,835 [worker.NotificationProcessor] DEBUG: queuedWork did not contain 07:5d43:08:000000000cc4be00 count wait not requeuing worker2.log:2018-01-03 15:44:05,432 [worker.NotificationProcessor] DEBUG: queuedWork did not contain 07:flrf:08:0000000024477200 count wait not requeuing ``` The two notifications above are present in the table, but never being processing by the workers. ``` $ accumulo shell -u root -p secret -e 'compact -t stresso -w' $ fluo scan -a stresso --raw -c ntfy 07:5d43:08:000000000cc4be00 ntfy:count:wait [] 822968-INSERT 07:flrf:08:0000000024477200 ntfy:count:wait [] 8796731-INSERT ``` I confirmed these notifications were present in the workers processes `queuedWork` hashmaps by taking heap dumps.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
