[ 
https://issues.apache.org/jira/browse/TAJO-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163708#comment-14163708
 ] 

ASF GitHub Bot commented on TAJO-1069:
--------------------------------------

Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/180#discussion_r18594461
  
    --- Diff: tajo-docs/src/main/sphinx/configuration/ha_configuration.rst ---
    @@ -132,4 +132,15 @@ If you want to initiate HA information, execute ``tajo 
haadmin -formatHA`` ::
     
     .. note::
     
    -  Before format HA, you must shutdown the tajo cluster.
    \ No newline at end of file
    +  Before format HA, you must shutdown the Tajo cluster.
    +
    +
    +================================================
    +  How to Test Automatic Failover
    +================================================
    +
    +If you want to verify automatic failover of TajoMaster, you must deploy 
your Tajo cluster with TajoMaster HA enable. And then, you need to find which 
node is active from Tajo web UI.
    +
    +Once you find your active TajoMaster, you can cause a failure on that 
node. For example, you can use kill -9 <pid of TajoMaster> to simulate a JVM 
crash. Or you can shutdown the machine or disconnect network interface. And 
then, the backup TajoMaster will be automatically active within 5 seconds. The 
amount of time required to detect a failure and  trigger a failover depends on 
the config ``tajo.master.ha.monitor.interval``. If there is running queries, it 
will be finished successfully. Because your TajoClient will get the result data 
on TajoWorker. But you can't find already query history. Because TajoMaster 
stores query history on memory. So, the other master can't access already 
active master query history. And if there is no running query, the automatic 
failover run successfully.
    +
    +For reference, TajoMaster HA doesn't consider TajoWorker failure. It is 
related with TajoResourceManager and QueryMaster.
    --- End diff --
    
    Note that TajoMaster HA does not consider TajoWorker failure. It guarantees 
the high availability of both TajoResourceManager and QueryMaster.


> Add document to explain High Availability support
> -------------------------------------------------
>
>                 Key: TAJO-1069
>                 URL: https://issues.apache.org/jira/browse/TAJO-1069
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: documentation
>    Affects Versions: 0.9.0
>            Reporter: Mai Hai Thanh
>            Assignee: Jaehwa Jung
>             Fix For: 0.9.0
>
>
> High Availability (HA) support is important for large-scale and distributed 
> systems like Tajo. As I know, Tajo at least supports HA for TajoMaster 
> (TAJO-704). However, it is not clear how HA is supported for other components 
> and how Tajo reacts in different situations. In the documentation, we should 
> talk about it. For example, we can provide the answers for the following (or 
> more) questions.
> + What happen if TajoMaster crashes ? for both cases,
>    - When there is no query running.
>    - When there is one (or more) query running
>    
> + What happen if a TajoWorker crashes ?  for both cases,
>    - When there is no query running.
>    - When there is one (or more) query running
> For the above questions, the case when there is a running query is very 
> important because we say "... Tajo is designed for both interactive and 
> *batch* queries ... Tajo provides fault-tolerance ... for *long-running 
> queries* ...".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to