[ 
https://issues.apache.org/jira/browse/HADOOP-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605212#action_12605212
 ] 

Karam Singh commented on HADOOP-3531:
-------------------------------------

To verify this issue did the following -:
1. Tried a scenario where --gridservice-mapred.pkgs and --gridservice-hdfs.pkgs 
paths correct on three nodes and max-master-failure=12. Tried successfully hod 
allocation with 15 nodes three times and monitored the ringmaster log-:
    a. namenode came up  in 2nd retry. jobtracker came in 4 retry after 3 
failures.
    b. namenode came up in 9th retry  after 8 failures. jobtracker came in 1st 
try.
    c. namenode came up in first try. Jobtracker came up in 3 retry after 2 
failures.
2. Tried a scenario where --gridservice-mapred.pkgs path correct on two nodes 
and max-master-failure=13 using static dfs. Tried successfully hod allocation 
with 15 nodes 4 times and monitored the ringmaster log-: jobtracker came in 
first try for 3 allocations. In 4th allocation jobtracker came up in 8th retry 
after 7 failures.
3. Tried a scenario where --hodring.java-home correct only on ringmaster, with 
max-failures=12. namenode came up on ringmaster node. All other 14 hodrings 
failed to start with "Invalid --hodring.java-home" error (observed from 
ringmaster log). ringmaster waited 2 mins for mapred before giving up
3, Tried a scenario where --hodring.java-home correct on 3 nodes , with 
max-failures=12. Tried hod allocate 15 nodes. namenode came up on ringmaster 
node.12 hodrings failed with invalid --hodring.java-home error.  jobtracker, dn 
and tt came up on remaining two nodes
    
Also tried some negative test with max-failures= 2-:
1. Provided wrong --hodring.pkgs. Verified that hod allocation fails as 
ringmaster failed with proper error message.
2. Provided wrong path for --gridservice-mapred.pkgs and 
--gridservice-hdfs.pkgs. Verified that proper error message from ringmaster log 
displayed at hod client side. Also tried with invalid tarball
3. Tried a scenario with --gridservice-mapred.pkgs and --gridservice-hdfs.pkgs 
path correct only ringmaster node with max-master failures =2
    Tried two times -:
    a. hod allocation failed as jobtracker failed to start with proper error 
message and ringmaster log also showing -:Detected errors (3) beyond allowed 
number of failures (2). Flagging error to client
   b. hod allocation failed as namenode  failed to start with proper error 
message and ringmaster log also showing -:Detected errors (3) beyond allowed 
number of failures (2). Flagging error to client


> Hod does not  report job tracker failure on hod client side when job tracker 
> fails to come up
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3531
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>    Affects Versions: 0.18.0
>            Reporter: Karam Singh
>            Assignee: Hemanth Yamijala
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: 3531.patch
>
>
> Hod does not  report job tracker failure on hod client side when job tracker 
> fails to come up. 
> When max-master-failure > 1
> hod client does not properly show why job tracker failed to come up, while in 
> case namenode proper error message is displayed.
> Also in namenode failure ringmaster log contains information such as -: 
> "Detected errors (3) beyond allowed number of failures (2). Flagging error to 
> client"
> while no such information is there in ringmaster log for job tracker failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to