keith-turner commented on a change in pull request #142: Added troubleshooting 
documentation
URL: https://github.com/apache/fluo-website/pull/142#discussion_r174281774
 
 

 ##########
 File path: _fluo-1-2/administration/troubleshooting.md
 ##########
 @@ -0,0 +1,56 @@
+---
+title: Troubleshooting
+category: administration
+order: 7
+---
+
+Steps for troubleshooting problems with Fluo applications.
+
+## Fluo application stops processing data
+
+1. Confirm that your application is running with the expected number of 
workers. 
+    ```bash
+    $ fluo list
+    Fluo instance (localhost/fluo) contains 1 application(s)
+
+    Application     Status     # Workers
+    -----------     ------     ---------
+    webindex        RUNNING        3
+    ```
+   Look for errors in the logs of any oracle or worker that has died.
+
+1. Run the `fluo wait` command to see if you application is processing 
notifications. 
+    ```bash
+    $ fluo wait -a webindex
+    [command.FluoWait] INFO : The wait command will exit when all 
notifications are processed
+    [command.FluoWait] INFO : 140 notifications are still outstanding.  Will 
try again in 10 seconds...
+    [command.FluoWait] INFO : 140 notifications are still outstanding.  Will 
try again in 10 seconds...
+    [command.FluoWait] INFO : 140 notifications are still outstanding.  Will 
try again in 10 seconds...
+    [command.FluoWait] INFO : 96 notifications are still outstanding.  Will 
try again in 10 seconds...
+    [command.FluoWait] INFO : 70 notifications are still outstanding.  Will 
try again in 10 seconds...
+    [command.FluoWait] INFO : 31 notifications are still outstanding.  Will 
try again in 10 seconds...
+    [command.FluoWait] INFO : All processing has finished!
+    ```
+   The number of notifications will increase as data is added to the 
application but they should eventually decrease
+   to zero and processing should finish.
+
+1. Look for errors or exceptions in the logs of all oracle and worker 
processes. Processing can stop if all threads
+   in a worker process were consumed by exceptions thrown in Fluo 
application's observer code. These exceptions
+   are often due to parsing issues or corner cases not seen during development 
or using small data sets.
+
+1. If you are using a cluster manager (i.e Marathon, YARN etc) to run your 
Fluo application, look for errors in the logs of
+   your cluster manager or application manager.  Below are some common errors: 
+
+    * Cluster managers sometimes fail to start all process of Fluo application 
due to lack of container slots or resources (CPU, memory, etc).
+      This can be fixed by giving more resources to your cluster manager or 
decrease the number/resources of Fluo workers.
+    * Cluster managers can kill Fluo processes if they use too much memory. 
This can be fixed by allocating more memory to your workers.
+
+1. Run [jstack] to get stack traces of threads in your Fluo application 
processes and look for any dead locks.
 
 Review comment:
   I would say stuck threads instead of dead lock.  Thread can get stuck for 
many reasons, I/O, dead lock, live lock, remote resource unavailable, etc.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to