Victoria Markman created DRILL-4276:
---------------------------------------

             Summary: Need a way to check on status of drillbits
                 Key: DRILL-4276
                 URL: https://issues.apache.org/jira/browse/DRILL-4276
             Project: Apache Drill
          Issue Type: New Feature
          Components: Execution - Monitoring
            Reporter: Victoria Markman


So I had this situation when cluster started with 8 nodes and 2 went down for 
some reason. 

As a user, my only way to detect this situation:

* query failed because something started to execute on a node and failed 
because it went down (and for that I have to comb through the logs to find a 
warning)
* my queries are extremely slow, because my queries started to execute after 
node went down and got deregistered from zookeeper.
* somebody just stopped drillbit on a particular node

Since there is no central place (apart from zookeeper) where information on 
participating nodes is kept, when I queried sys.drillbits, I got 6 nodes, as if 
2 others never existed ...There is beauty in flexibilty, but in real life 
situation when you have more than 20 nodes, things can get out control quickly.

Since zookeeper has this information in the first place, can we enhance 
sys.drillbits table to have drillbit status as zookeeper sees it ?

This can also help with testing and automating test cases that test for failure 
conditions like that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to