[
https://issues.apache.org/jira/browse/KUDU-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450871#comment-17450871
]
ASF subversion and git services commented on KUDU-1959:
-------------------------------------------------------
Commit 6d81fe44b51844942bc8433931d663531547c4b8 in kudu's branch
refs/heads/master from Abhishek Chennaka
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=6d81fe4 ]
KUDU-1959 - Add tests for /startup page and metrics for tservers
This patch implements the tests for the startup page using mini tablet
server.
- We inject latency to bootstrap tablets while reading the webpage every
10 milliseconds and validating the status for each step.
- Fail a data directory and validate the status of each startup step.
We also validate the below startup metrics in the above scenarios
(log_block_manager* metrics in the case of using log block manager):
- log_block_manager_total_containers_startup
- log_block_manager_processed_containers_startup
- log_block_manager_containers_processing_time_startup
- tablets_num_total_startup
- tablets_num_opened_startup
- tablets_opening_time_startup
Additionally we also fix a race condition in the Kudu tablet server
WebUI. This race condition occurs if the tablet server is started while
the WebUI is continuously curled. The reason appears to be starting up
of the webserver before registering the path handlers as a part of
the change https://gerrit.cloudera.org/#/c/17730/
Change-Id: I9f432b4eb813e51214b4d6b3c5b7b4c89426f47f
Reviewed-on: http://gerrit.cloudera.org:8080/17990
Reviewed-by: Andrew Wong <[email protected]>
Tested-by: Andrew Wong <[email protected]>
> Hard to tell when a cluster is done starting up
> -----------------------------------------------
>
> Key: KUDU-1959
> URL: https://issues.apache.org/jira/browse/KUDU-1959
> Project: Kudu
> Issue Type: Improvement
> Components: ops-tooling
> Reporter: Jean-Daniel Cryans
> Assignee: Abhishek
> Priority: Major
> Labels: roadmap-candidate, usability
>
> Restarting a cluster that has a good amount of data, it's hard to tell when
> it's "done". Right now the things I do:
> - Run ksck, wait until most tablets are not in "unavailable" or
> "boostrapping" state.
> - Watch the metrics and see when the data under management is close to where
> it was before restarting (it grows as tablets are getting bootstrapped).
> - Look at the tablet server web UIs for tablets, compare how many are done
> bootstrapping VS in the process of VS not started.
> Ideas on how to improve this:
> - In the master's web UI for tablet servers, show how many tablets are
> running VS not running (I wouldn't add anything about tombstoned tablets)
> - Add metrics for tablets in different states.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)