[ 
https://issues.apache.org/jira/browse/HAWQ-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15370177#comment-15370177
 ] 

ASF GitHub Bot commented on HAWQ-901:
-------------------------------------

GitHub user radarwave opened a pull request:

    https://github.com/apache/incubator-hawq/pull/783

    HAWQ-901 Add retries to standby master start check

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/radarwave/incubator-hawq standbyup

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hawq/pull/783.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #783
    
----
commit c6ba62afc5e8ca61ce47efe647fe252402b59721
Author: rlei <[email protected]>
Date:   2016-07-11T02:22:29Z

    HAWQ-901 Add retries to standby master start check

----


> hawq init failed: hawqstandbywatch.py:test5:gpadmin-[WARNING]:-syncmaster not 
> running
> -------------------------------------------------------------------------------------
>
>                 Key: HAWQ-901
>                 URL: https://issues.apache.org/jira/browse/HAWQ-901
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Command Line Tools
>    Affects Versions: 2.0.0.0-incubating
>            Reporter: Ming LI
>            Assignee: Radar Lei
>
> Error message in ~/hawqAdminLogs/hawq_init_XXXXXXXX.log
> ------------------------------------------------------------------------------------
> 20160706:06:45:53:006218 hawq_start:test1:gpadmin-[INFO]:-Start hawq with 
> args: ['start', 'standby']
> 20160706:06:45:53:006218 hawq_start:test1:gpadmin-[INFO]:-Gathering 
> information and validating the environment...
> 20160706:06:45:53:006218 hawq_start:test1:gpadmin-[INFO]:-Start standby 
> master service
> 20160706:06:46:02:006218 hawq_start:test1:gpadmin-[INFO]:-Checking standby 
> master status
> 20160706:06:45:55:004418 hawqstandbywatch.py:test5:gpadmin-[INFO]:-Monitoring 
> logs
> 20160706:06:46:00:004418 hawqstandbywatch.py:test5:gpadmin-[INFO]:-checking 
> if syncmaster is running
> 20160706:06:46:02:004418 
> hawqstandbywatch.py:test5:gpadmin-[WARNING]:-syncmaster not running
> 20160706:06:46:02:006218 hawq_start:test1:gpadmin-[ERROR]:-Standby master 
> start failed, exit
> 20160706:06:46:02:003999 hawqinit.sh:test5:gpadmin-[ERROR]:-Start HAWQ 
> standby failed
> ------------------------------------------------------------------------------
> (1) I suspect the root cause maybe: we only wait 5 seconds before we check 
> standby running status, this interval is too small.  Could you please firstly 
> change the standby running status check interval from 5 seconds to a loop 
> like recovery running status check on master? 
> (2) If the error 'syncmaster not running' will lead to init failure, we 
> should change from [WARNING] to [ERROR]. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to