[
https://issues.apache.org/jira/browse/TAJO-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163708#comment-14163708
]
ASF GitHub Bot commented on TAJO-1069:
--------------------------------------
Github user hyunsik commented on a diff in the pull request:
https://github.com/apache/tajo/pull/180#discussion_r18594461
--- Diff: tajo-docs/src/main/sphinx/configuration/ha_configuration.rst ---
@@ -132,4 +132,15 @@ If you want to initiate HA information, execute ``tajo
haadmin -formatHA`` ::
.. note::
- Before format HA, you must shutdown the tajo cluster.
\ No newline at end of file
+ Before format HA, you must shutdown the Tajo cluster.
+
+
+================================================
+ How to Test Automatic Failover
+================================================
+
+If you want to verify automatic failover of TajoMaster, you must deploy
your Tajo cluster with TajoMaster HA enable. And then, you need to find which
node is active from Tajo web UI.
+
+Once you find your active TajoMaster, you can cause a failure on that
node. For example, you can use kill -9 <pid of TajoMaster> to simulate a JVM
crash. Or you can shutdown the machine or disconnect network interface. And
then, the backup TajoMaster will be automatically active within 5 seconds. The
amount of time required to detect a failure and trigger a failover depends on
the config ``tajo.master.ha.monitor.interval``. If there is running queries, it
will be finished successfully. Because your TajoClient will get the result data
on TajoWorker. But you can't find already query history. Because TajoMaster
stores query history on memory. So, the other master can't access already
active master query history. And if there is no running query, the automatic
failover run successfully.
+
+For reference, TajoMaster HA doesn't consider TajoWorker failure. It is
related with TajoResourceManager and QueryMaster.
--- End diff --
Note that TajoMaster HA does not consider TajoWorker failure. It guarantees
the high availability of both TajoResourceManager and QueryMaster.
> Add document to explain High Availability support
> -------------------------------------------------
>
> Key: TAJO-1069
> URL: https://issues.apache.org/jira/browse/TAJO-1069
> Project: Tajo
> Issue Type: Sub-task
> Components: documentation
> Affects Versions: 0.9.0
> Reporter: Mai Hai Thanh
> Assignee: Jaehwa Jung
> Fix For: 0.9.0
>
>
> High Availability (HA) support is important for large-scale and distributed
> systems like Tajo. As I know, Tajo at least supports HA for TajoMaster
> (TAJO-704). However, it is not clear how HA is supported for other components
> and how Tajo reacts in different situations. In the documentation, we should
> talk about it. For example, we can provide the answers for the following (or
> more) questions.
> + What happen if TajoMaster crashes ? for both cases,
> - When there is no query running.
> - When there is one (or more) query running
>
> + What happen if a TajoWorker crashes ? for both cases,
> - When there is no query running.
> - When there is one (or more) query running
> For the above questions, the case when there is a running query is very
> important because we say "... Tajo is designed for both interactive and
> *batch* queries ... Tajo provides fault-tolerance ... for *long-running
> queries* ...".
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)