[
https://issues.apache.org/jira/browse/HADOOP-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605212#action_12605212
]
Karam Singh commented on HADOOP-3531:
-------------------------------------
To verify this issue did the following -:
1. Tried a scenario where --gridservice-mapred.pkgs and --gridservice-hdfs.pkgs
paths correct on three nodes and max-master-failure=12. Tried successfully hod
allocation with 15 nodes three times and monitored the ringmaster log-:
a. namenode came up in 2nd retry. jobtracker came in 4 retry after 3
failures.
b. namenode came up in 9th retry after 8 failures. jobtracker came in 1st
try.
c. namenode came up in first try. Jobtracker came up in 3 retry after 2
failures.
2. Tried a scenario where --gridservice-mapred.pkgs path correct on two nodes
and max-master-failure=13 using static dfs. Tried successfully hod allocation
with 15 nodes 4 times and monitored the ringmaster log-: jobtracker came in
first try for 3 allocations. In 4th allocation jobtracker came up in 8th retry
after 7 failures.
3. Tried a scenario where --hodring.java-home correct only on ringmaster, with
max-failures=12. namenode came up on ringmaster node. All other 14 hodrings
failed to start with "Invalid --hodring.java-home" error (observed from
ringmaster log). ringmaster waited 2 mins for mapred before giving up
3, Tried a scenario where --hodring.java-home correct on 3 nodes , with
max-failures=12. Tried hod allocate 15 nodes. namenode came up on ringmaster
node.12 hodrings failed with invalid --hodring.java-home error. jobtracker, dn
and tt came up on remaining two nodes
Also tried some negative test with max-failures= 2-:
1. Provided wrong --hodring.pkgs. Verified that hod allocation fails as
ringmaster failed with proper error message.
2. Provided wrong path for --gridservice-mapred.pkgs and
--gridservice-hdfs.pkgs. Verified that proper error message from ringmaster log
displayed at hod client side. Also tried with invalid tarball
3. Tried a scenario with --gridservice-mapred.pkgs and --gridservice-hdfs.pkgs
path correct only ringmaster node with max-master failures =2
Tried two times -:
a. hod allocation failed as jobtracker failed to start with proper error
message and ringmaster log also showing -:Detected errors (3) beyond allowed
number of failures (2). Flagging error to client
b. hod allocation failed as namenode failed to start with proper error
message and ringmaster log also showing -:Detected errors (3) beyond allowed
number of failures (2). Flagging error to client
> Hod does not report job tracker failure on hod client side when job tracker
> fails to come up
> ---------------------------------------------------------------------------------------------
>
> Key: HADOOP-3531
> URL: https://issues.apache.org/jira/browse/HADOOP-3531
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/hod
> Affects Versions: 0.18.0
> Reporter: Karam Singh
> Assignee: Hemanth Yamijala
> Priority: Blocker
> Fix For: 0.18.0
>
> Attachments: 3531.patch
>
>
> Hod does not report job tracker failure on hod client side when job tracker
> fails to come up.
> When max-master-failure > 1
> hod client does not properly show why job tracker failed to come up, while in
> case namenode proper error message is displayed.
> Also in namenode failure ringmaster log contains information such as -:
> "Detected errors (3) beyond allowed number of failures (2). Flagging error to
> client"
> while no such information is there in ringmaster log for job tracker failures
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.