Re: [capistrano-mailing-list] Service restart failed after deploy task

niristotle okram Fri, 29 May 2015 16:35:08 -0700

A quick question, if i am allowed to. i am 4 nodes and if one of them is 
offline, it appears that the entire job fails. Is this by design? Should i 
put in timeout values to move on? 
I thought the commands are executed parallely




Deploying the ABC application via Capistrano

ffi-yajl/json_gem is deprecated, these monkeypatches will be dropped shortly

  * executing `staging'

ffi-yajl/json_gem is deprecated, these monkeypatches will be dropped shortly

    triggering start callbacks for `deploy'

  * executing `multistage:ensure'

  * executing `deploy'

    triggering before callbacks for `deploy'

  * executing `deploy:stop_app'

 * executing multiple commands in parallel

       servers: ["node1", "node2", "node3", "node4"]

connection failed for: node2 (Errno::ETIMEDOUT: Connection timed out - 
connect(2) for "node2" port 22)

Build step 'Execute shell' marked build as failure

Finished: FAILURE










On Wednesday, May 20, 2015 at 4:43:25 PM UTC-7, niristotle okram wrote:
>
> hey Lee,
> just to put an end on this thread: 
> the cause was not the cap2, per se. It was due to the process sprawn by 
> cap being killed when the cap exits its task. We had to modify the init 
> script to fix this issue.  The 'nohup' is used to keep the daemon in the bg 
> even after the cap exist the ssh session. Below is the modified portion of 
> the script. 
>
>
> case "$1" in
> start)
>     #checking to see if the process is already running, if it is, display 
> a message and exit
>     if [ -n "`ps aux |grep  /opt/mount1/oss/current/src/main.py | grep -v 
> grep`" ]; then
>       echo "Service is already running, try stopping it first with 
> 'service oss stop'"
>       exit
>     fi
>
> printf "%-50s" "Starting $DAEMON_NAME..."
> cd $DIR
> [ -d $LOGPATH ] || mkdir $LOGPATH
>   [ -f $LOGFILE ] || su $DAEMON_USER -c 'touch $LOGFILE'
> nohup $PYTHON $DAEMON $DAEMON_OPTS > $LOGFILE  2>&1 &
> echo $! > $PIDFILE
>     sleep 5
> ;;
>
> thanks
> Okram
>  
>
>
>
>
> On Tuesday, May 19, 2015 at 1:28:32 PM UTC-7, Lee Hambley wrote:
>>
>> It almost certainly has something to do with your process failing to 
>> daemonize properly. Ruby processes struggle too, check 
>> http://stackoverflow.com/a/688448/119669 it has to do with parts of the 
>> process that is forked (your python daemon in this case) still inheriting 
>> some resources which are attached to the Capistrano session.
>>
>> This unfortunately falls outside stuff I can help you with reasonably or 
>> remotely. You *might* have some success learning enough strace to see your 
>> process, and how it behaves when Cap disconnects. 
>>
>> Lee Hambley
>> http://lee.hambley.name/
>> +49 (0) 170 298 5667
>>
>> On 19 May 2015 at 22:23, niristotle okram <nirish...@gmail.com> wrote:
>>
>>> I apologize, i made a mistake in reporting the issue. :(
>>>
>>> The original report about " xyz.pid doesn't exist, when it actually 
>>> does." is not an ERROR. Its working as design. The service was manually 
>>> stopped before trying to stop again. And that is why it threw that error. 
>>> The issue here is, When the capistrano starts the service as a part of the 
>>> task "deploy:restart_app" after the deploy, the service doesn't start 
>>> up fine. Checking the status "service xyz status" after the deploy returns 
>>> "Process dead but pid exists". 
>>>
>>> This behaviors of service start failure is only seen post the cap deploy 
>>> and cannot be reproduced manually.   
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tuesday, May 19, 2015 at 12:10:50 AM UTC-7, Lee Hambley wrote:
>>>>
>>>> Sorry, I can't see anything wrong with it. :-\
>>>>
>>>> Lee Hambley
>>>> http://lee.hambley.name/
>>>> +49 (0) 170 298 5667
>>>>
>>>> On 19 May 2015 at 02:26, niristotle okram <nirish...@gmail.com> wrote:
>>>>
>>>>> hi Lee, 
>>>>>
>>>>> here is the full /etc/init.d/ script  http://pastebin.com/02G5tpgH
>>>>>
>>>>>
>>>>> So, i placed a task to stop the app, then deploy & then start the 
>>>>> service/app. The task have the below commands to check
>>>>>
>>>>> whoami    ----->   ** [out :: server4] root
>>>>> ll /var/run/  ---> this shows the xyz.pid file 
>>>>>
>>>>> So i see the service stops just fine. 
>>>>>
>>>>>  ** [out :: server1] Stopping
>>>>>
>>>>>  ** [out :: server1] Ok
>>>>>
>>>>>
>>>>>
>>>>> And the service also starts just fins
>>>>>
>>>>>
>>>>> ** [out :: server1] Starting xyz...
>>>>>
>>>>>  ** [out :: server1] Ok
>>>>>
>>>>>
>>>>>
>>>>> But on checking manually -- "service xyz status", i get this " Process 
>>>>> dead but pidfile exist". I can stop and start the service just fine 
>>>>> manually. 
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Monday, May 18, 2015 at 12:24:50 PM UTC-7, Lee Hambley wrote:
>>>>>>
>>>>>> You wrote that the /etc/init.d/xyz is done by "sudo" so the deploy 
>>>>>> user apparently has access to password-less sudo (at least for some 
>>>>>> actions), it would appear that the file is not visible to `root`. Which 
>>>>>> I 
>>>>>> don't believe or expect.
>>>>>>
>>>>>> You included a part of the /etc/init.d/xyz, but didn't include the 
>>>>>> full thing for some reason, so I can't see what the value of $PIDFILE 
>>>>>> should be in this case (*please*, adhere to the list guidelines and 
>>>>>> paste long files in an external service, and link them), nor state where 
>>>>>> you got the template.
>>>>>>
>>>>>> I also don't understand the logic behind setting shell: 'bash' on 
>>>>>> the run() lines that interface with the init script.
>>>>>>
>>>>>> Your task:
>>>>>>
>>>>>> task :stop_app, :roles => :web do
>>>>>>
>>>>>> run "sudo /etc/init.d/xyz stop", :shell => :bash
>>>>>>
>>>>>> end
>>>>>>
>>>>>>
>>>>>> I might suggest you extend that (or make a similar one, 
>>>>>> "debug_initd_stuff") that does something like:
>>>>>>
>>>>>> task :debug_initd_stuff, :roles => :web do
>>>>>>
>>>>>> run "sudo whoam"
>>>>>>
>>>>>> run "sudo ls -l /etc/init.d"
>>>>>>
>>>>>> run "sudo ls -l /var/run"
>>>>>>
>>>>>> end
>>>>>>
>>>>>>
>>>>>> You might also want to run the init.d script through shellcheck.net, 
>>>>>> since there are quite a few violations and bad practices already in 
>>>>>> sight 
>>>>>> there, shellcheck might help you iron some of them out. (That said, 
>>>>>> honestly the problem is probably something much simpler.)
>>>>>>
>>>>>>
>>>>>> Lee Hambley
>>>>>> http://lee.hambley.name/
>>>>>> +49 (0) 170 298 5667
>>>>>>
>>>>>> On 18 May 2015 at 21:14, niristotle okram <nirish...@gmail.com> 
>>>>>> wrote:
>>>>>>
>>>>>>> Versions:
>>>>>>>
>>>>>>>    - Ruby 2.1.1
>>>>>>>    - Capistrano 2 
>>>>>>>    - Rake / Rails / etc
>>>>>>>
>>>>>>> Platform:
>>>>>>>
>>>>>>>    - Working on.... RHEL 6
>>>>>>>    - Deploying to... RHEL 6
>>>>>>>
>>>>>>>
>>>>>>> A part of the Deploy.rb:
>>>>>>>
>>>>>>> #before "deploy", "deploy:stop_app"
>>>>>>>
>>>>>>>  #after "deploy", "deploy:start_app"
>>>>>>>
>>>>>>>    after "deploy", "deploy:restart_app"
>>>>>>>
>>>>>>>    namespace :deploy do
>>>>>>>
>>>>>>>     task :update_code, :roles => :web, :except => { :no_release => 
>>>>>>> true } do
>>>>>>>
>>>>>>>       on_rollback { puts "DO NOT WANT TO ROLL BACK?" }
>>>>>>>
>>>>>>>       strategy.deploy!
>>>>>>>
>>>>>>>       finalize_update
>>>>>>>
>>>>>>>     end
>>>>>>>
>>>>>>>  
>>>>>>>
>>>>>>>     task :stop_app, :roles => :web do
>>>>>>>
>>>>>>>       run "sudo /etc/init.d/xyz stop", :shell => :bash
>>>>>>>
>>>>>>>     end
>>>>>>>
>>>>>>>  
>>>>>>>
>>>>>>>     task :start_app, :roles => :web do
>>>>>>>
>>>>>>>       run "sudo /etc/init.d/xyz start", :shell => :bash
>>>>>>>
>>>>>>>     end
>>>>>>>
>>>>>>>  
>>>>>>>
>>>>>>>  
>>>>>>>
>>>>>>>     task :restart_app, :roles => :web do
>>>>>>>
>>>>>>>       run "sudo /etc/init.d/xyz restart", :shell => :bash
>>>>>>>
>>>>>>>     end
>>>>>>>
>>>>>>>   end
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I have the parameter in the 'deploy.rb', 
>>>>>>>
>>>>>>> *set :user, 'a_user'*
>>>>>>>
>>>>>>> Q: Which user performs the task to restart the service (xyz) after 
>>>>>>> the deployment of app (xyz)? I am getting the errors saying the xyz.pid 
>>>>>>> doesn't exist, when it actually does. This is a part of the shell 
>>>>>>> script 
>>>>>>> while stopping the service.  
>>>>>>>
>>>>>>>
>>>>>>> A part of the /etc/init.d/xyz
>>>>>>>
>>>>>>> case "$1" in
>>>>>>> start)
>>>>>>>         printf "%-50s" "Starting $DAEMON_NAME..."
>>>>>>>         cd $DIR
>>>>>>>         [ -d $LOGPATH ] || mkdir $LOGPATH
>>>>>>>   [ -f $LOGFILE ] || su $DAEMON_USER -c 'touch $LOGFILE'
>>>>>>>         PID=`$PYTHON $DAEMON $DAEMON_OPTS > $LOGFILE  2>&1 & echo $!`
>>>>>>>         #echo "Saving PID" $PID " to " $PIDFILE
>>>>>>>         if [ -z $PID ]; then
>>>>>>>             printf "%s\n" "Fail"
>>>>>>>         else
>>>>>>>             echo $PID > $PIDFILE
>>>>>>>             printf "%s\n" "Ok"
>>>>>>>         fi
>>>>>>> ;;
>>>>>>> status)
>>>>>>>         printf "%-50s" "Checking $DAEMON_NAME..."
>>>>>>>         if [ -f $PIDFILE ]; then
>>>>>>>             PID=`cat $PIDFILE`
>>>>>>>             if [ -z "`ps axf | grep ${PID} | grep -v grep`" ]; then
>>>>>>>                 printf "%s\n" "Process dead but pidfile exists"
>>>>>>>             else
>>>>>>>                 echo "Running"
>>>>>>>             fi
>>>>>>>         else
>>>>>>>             printf "%s\n" "Service not running"
>>>>>>>         fi
>>>>>>> ;;
>>>>>>> stop)
>>>>>>>         printf "%-50s" "Stopping $DAEMONNAME"
>>>>>>>             PID=`cat $PIDFILE`
>>>>>>>             cd $DIR
>>>>>>>         if [ -f $PIDFILE ]; then
>>>>>>>             kill -HUP $PID
>>>>>>>             printf "%s\n" "Ok"
>>>>>>>             rm -f $PIDFILE
>>>>>>>         else
>>>>>>>             printf "%s\n" "pidfile not found"
>>>>>>>         fi
>>>>>>> ;;
>>>>>>>
>>>>>>> restart)
>>>>>>>         $0 stop
>>>>>>>         $0 start
>>>>>>> ;;
>>>>>>>
>>>>>>> *)
>>>>>>>         echo "Usage: $0 {status|start|stop|restart}"
>>>>>>>         exit 1
>>>>>>> esac
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Capistrano log 
>>>>>>>
>>>>>>>   * executing `deploy:restart_app'
>>>>>>>
>>>>>>>   * executing multiple commands in parallel
>>>>>>>
>>>>>>>     -> "else" :: "sudo /etc/init.d/xyz restart"
>>>>>>>
>>>>>>>     -> "else" :: "sudo /etc/init.d/xyz restart"
>>>>>>>
>>>>>>>     -> "else" :: "sudo /etc/init.d/xyz restart"
>>>>>>>
>>>>>>>     -> "else" :: "sudo /etc/init.d/xyz restart"
>>>>>>>
>>>>>>>     servers: ["server1", "server2", "server3", "server4"]
>>>>>>>
>>>>>>>     [server1] executing command
>>>>>>>
>>>>>>>     [server2] executing command
>>>>>>>
>>>>>>>     [server3] executing command
>>>>>>>
>>>>>>>     [server4] executing command
>>>>>>>
>>>>>>>  ** [out :: server1] Stopping
>>>>>>>
>>>>>>>  ** [out :: server1] cat: /var/run/xyz.pid: No such file or directory
>>>>>>>
>>>>>>>  ** [out :: server1] pidfile not found
>>>>>>>
>>>>>>>  ** [out :: server1] Starting xyz...
>>>>>>>
>>>>>>>  ** [out :: server2] Stopping
>>>>>>>
>>>>>>>  ** [out :: server2] cat: /var/run/xyz.pid: No such file or directory
>>>>>>>
>>>>>>>  ** [out :: server2] pidfile not found
>>>>>>>
>>>>>>>  ** [out :: server2] Starting xyz...
>>>>>>>
>>>>>>>  ** [out :: server2] Ok
>>>>>>>
>>>>>>>  ** [out :: server1] Ok
>>>>>>>
>>>>>>>  ** [out :: server3] Stopping
>>>>>>>
>>>>>>>  ** [out :: server3] cat: /var/run/xyz.pid: No such file or directory
>>>>>>>
>>>>>>>  ** [out :: server3] pidfile not found
>>>>>>>
>>>>>>>  ** [out :: server4] Stopping
>>>>>>>
>>>>>>>  ** [out :: server4] Ok
>>>>>>>
>>>>>>>  ** [out :: server3] Starting xyz...
>>>>>>>
>>>>>>>  ** [out :: server3] Ok
>>>>>>>
>>>>>>>  ** [out :: server4] Starting xyz...
>>>>>>>
>>>>>>>  ** [out :: server4] Ok
>>>>>>>
>>>>>>>     command finished in 659ms
>>>>>>>
>>>>>>> Finished: SUCCESS
>>>>>>>
>>>>>>>  
>>>>>>>
>>>>>>>
>>>>>>> I can cat the file as the deploy user just fine. 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "Capistrano" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to capistrano+...@googlegroups.com.
>>>>>>> To view this discussion on the web, visit 
>>>>>>> https://groups.google.com/d/msgid/capistrano/56a2a2dd-fd26-4b14-a2da-0d7af37f8354%40googlegroups.com
>>>>>>>  
>>>>>>> <https://groups.google.com/d/msgid/capistrano/56a2a2dd-fd26-4b14-a2da-0d7af37f8354%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>  -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "Capistrano" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to capistrano+...@googlegroups.com.
>>>>> To view this discussion on the web, visit 
>>>>> https://groups.google.com/d/msgid/capistrano/d47d5020-1915-4194-be85-b72e157b0c23%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/capistrano/d47d5020-1915-4194-be85-b72e157b0c23%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Capistrano" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to capistrano+...@googlegroups.com.
>>> To view this discussion on the web, visit 
>>> https://groups.google.com/d/msgid/capistrano/bdf4409b-4f59-42f2-be66-2bb4f895dbfe%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/capistrano/bdf4409b-4f59-42f2-be66-2bb4f895dbfe%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Capistrano" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to capistrano+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/capistrano/09fc2a6b-f95c-435e-bc72-1f51e9964200%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [capistrano-mailing-list] Service restart failed after deploy task

Reply via email to