[
https://issues.apache.org/jira/browse/TAJO-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162213#comment-14162213
]
ASF GitHub Bot commented on TAJO-1069:
--------------------------------------
Github user hyunsik commented on a diff in the pull request:
https://github.com/apache/tajo/pull/180#discussion_r18537424
--- Diff: tajo-docs/src/main/sphinx/configuration/ha_configuration.rst ---
@@ -132,4 +132,16 @@ If you want to initiate HA information, execute ``tajo
haadmin -formatHA`` ::
.. note::
- Before format HA, you must shutdown the tajo cluster.
\ No newline at end of file
+ Before format HA, you must shutdown the Tajo cluster.
+
+
+================================================
+ Verify Automatic Failover
+================================================
+
+If you want to verify automatic failover, you must deploy your Tajo
cluster with TajoMaster HA enable. And then, you
+need to find which node is active by visiting the Tajo web interfaces.
+
+Once you have located your active TajoMaster, you can cause a failure on
that node. For example, you can use kill -9 <pid of TajoMaster> to simulate a
JVM crash. Or you can shutdown the machine or disconnect network interface. And
then, the backup TajoMaster should automatically become active within 5
seconds. The amount of time required to detect a failure and trigger a
failover depends on the configuration of ``tajo.master.ha.monitor.interval``.
If there is running queries, it will be finished successfully. Because your
TajoClient will get the result data on TajoWorker. But you can't find already
query history. Because TajoMaster stores query history on memory. So, the other
master can't access already active master query history. And if there is no
running query, the automatic failover run successfully.
--- End diff --
- s/have located/find/
- s/should automatically become active within 5 seconds./will be
automatically active within 5 seconds./
- s/configuration/config ``tajo.master.ha ...``/
> Add document to explain High Availability support
> -------------------------------------------------
>
> Key: TAJO-1069
> URL: https://issues.apache.org/jira/browse/TAJO-1069
> Project: Tajo
> Issue Type: Sub-task
> Components: documentation
> Affects Versions: 0.9.0
> Reporter: Mai Hai Thanh
> Assignee: Jaehwa Jung
> Fix For: 0.9.0
>
>
> High Availability (HA) support is important for large-scale and distributed
> systems like Tajo. As I know, Tajo at least supports HA for TajoMaster
> (TAJO-704). However, it is not clear how HA is supported for other components
> and how Tajo reacts in different situations. In the documentation, we should
> talk about it. For example, we can provide the answers for the following (or
> more) questions.
> + What happen if TajoMaster crashes ? for both cases,
> - When there is no query running.
> - When there is one (or more) query running
>
> + What happen if a TajoWorker crashes ? for both cases,
> - When there is no query running.
> - When there is one (or more) query running
> For the above questions, the case when there is a running query is very
> important because we say "... Tajo is designed for both interactive and
> *batch* queries ... Tajo provides fault-tolerance ... for *long-running
> queries* ...".
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)