[
https://issues.apache.org/jira/browse/HBASE-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941526#comment-15941526
]
Appy commented on HBASE-16775:
------------------------------
Things tried so far:
Dumped the configuration to logs to make sure that MR job is correctly getting
mapreduce.map.maxattempts.
Changed the code to use MiniMapReduce cluster. By default it spawns 2 servers.
Looking at minicluster logs, i see retries happening.
At this point i can't think of a way to make it work. Summarizing everything:
What this test was trying to test is: if mapper fails and we have retries
enabled, then overall job should pass.
To do so, earlier it was throwing exception from mapper based on probability,
which is crazy and highly flaky.
What i was trying to do is, set retries to Y and throw exceptions X times where
X < Y. Initially, X is 0 and is incremented on every injected failure. The
issue is, since mapper runs are isolated, i can't find a way to maintain state
of X across mappers. As a result, even the 4th retry of mapper will see X= 0
initially.
Now am thinking that my initial line of thought (in
[this|https://issues.apache.org/jira/browse/HBASE-16775?focusedCommentId=15553215&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15553215]
comment above) was right, this test is testing internals of mapreduce i.e. if
mapreduce.map.maxattempts is set, MR framework should retry.
[~huaxiang], [~jmhsieh].
> Flakey test with TestExportSnapshot#testExportRetry and
> TestMobExportSnapshot#testExportRetry
> ----------------------------------------------------------------------------------------------
>
> Key: HBASE-16775
> URL: https://issues.apache.org/jira/browse/HBASE-16775
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: huaxiang sun
> Assignee: huaxiang sun
> Attachments: disable.patch, HBASE-16775.master.001.patch,
> HBASE-16775.master.002.patch, HBASE-16775.master.003.patch
>
>
> The root cause is that conf.setInt("mapreduce.map.maxattempts", 10) is not
> taken by the mapper job, so the retry is actually 0. Debugging to see why
> this is the case.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)