[
https://issues.apache.org/jira/browse/HAWQ-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368924#comment-15368924
]
Radar Lei commented on HAWQ-901:
--------------------------------
Since recently standby changes, in some case standby start might take more
time, this caused standby start failed of timeout. The timeout current is too
small, I plan to add retry loop to fix it.
> hawq init failed: hawqstandbywatch.py:test5:gpadmin-[WARNING]:-syncmaster not
> running
> -------------------------------------------------------------------------------------
>
> Key: HAWQ-901
> URL: https://issues.apache.org/jira/browse/HAWQ-901
> Project: Apache HAWQ
> Issue Type: Bug
> Components: Command Line Tools
> Reporter: Ming LI
> Assignee: Radar Lei
>
> Error message in ~/hawqAdminLogs/hawq_init_XXXXXXXX.log
> ------------------------------------------------------------------------------------
> 20160706:06:45:53:006218 hawq_start:test1:gpadmin-[INFO]:-Start hawq with
> args: ['start', 'standby']
> 20160706:06:45:53:006218 hawq_start:test1:gpadmin-[INFO]:-Gathering
> information and validating the environment...
> 20160706:06:45:53:006218 hawq_start:test1:gpadmin-[INFO]:-Start standby
> master service
> 20160706:06:46:02:006218 hawq_start:test1:gpadmin-[INFO]:-Checking standby
> master status
> 20160706:06:45:55:004418 hawqstandbywatch.py:test5:gpadmin-[INFO]:-Monitoring
> logs
> 20160706:06:46:00:004418 hawqstandbywatch.py:test5:gpadmin-[INFO]:-checking
> if syncmaster is running
> 20160706:06:46:02:004418
> hawqstandbywatch.py:test5:gpadmin-[WARNING]:-syncmaster not running
> 20160706:06:46:02:006218 hawq_start:test1:gpadmin-[ERROR]:-Standby master
> start failed, exit
> 20160706:06:46:02:003999 hawqinit.sh:test5:gpadmin-[ERROR]:-Start HAWQ
> standby failed
> ------------------------------------------------------------------------------
> (1) I suspect the root cause maybe: we only wait 5 seconds before we check
> standby running status, this interval is too small. Could you please firstly
> change the standby running status check interval from 5 seconds to a loop
> like recovery running status check on master?
> (2) If the error 'syncmaster not running' will lead to init failure, we
> should change from [WARNING] to [ERROR].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)