[ 
https://issues.apache.org/jira/browse/HADOOP-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596743#action_12596743
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-3389:
-------------------------------------------------

Debugged this on hudson.zones solaris box and found out the actual reason for 
failure.

When i run /bin/sh -c "sleep 10", the processes that are spawned are (two?!)
bq. 28998  /bin/sh /bin/sh /bin/sh (??!)
and 
bq. 28999  sleep 10

While on RHEL, in spawns only one process which is the process we wish to run.
bq. 29015 sleep 10

And, we use /bin/sh -c "command" to run any external command from inside hod  - 
src.contrib.hod.Common.threads.simpleCommand class. The pid returned by this 
class is that of the direct child(which would be the one with pid 28998 and 
which doesn't return till 300 secs in this test-case) and this explains the 
issue. The issue has gone away when I made simpleCommand to use /bin/bash 
instead of /bin/sh to run the given command.

So, a possible fix for this could be changing the implementation of 
simpleCommand to use /bin/bash instead of /bin/sh. Hadoop is using /bin/bash 
all over the places, so this shouldn't be a problem.

Side Notes:
1) The first process(pid 28998) is odd in any case. Is this how solaris's 
/bin/sh behaves and why?
2) Nigel observed that /bin/sh is a link to /sbin/sh on this machine (ok/weird?)

> [HOD] HOD unit test RunHodCleanupTests fails on solaris
> -------------------------------------------------------
>
>                 Key: HADOOP-3389
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3389
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Hemanth Yamijala
>
> HOD unit test RunHodCleanupTests fails on solaris and this was first observed 
> while submitting HADOOP-3023. Hudson failed to run this test altogether, the 
> first time. The Second time, it took 300 secs to finish, instead of returning 
> immediately as observed on RHEL boxes. Because of this, HADOOP-3023 is 
> blocked, thereby no hod unit test can be run at all by Hudson.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to