[ https://issues.apache.org/jira/browse/HADOOP-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654255#action_12654255 ]
Steve Loughran commented on HADOOP-2483: ---------------------------------------- This is interesting to me; I'm effectively doing some of this in our codebase as we try out some of the lifecycle, but my tests are still trying to bring up and stress a functional cluster and not, yet, test how that cluster copes with various failure modes, such as * transient loss of namenode * loss of 10%, 20%, 30%, 50%, 50%+ of the workers -through either outages or network partitioning * DNS playing up. Because it will, you know :) * JT, TT, failures. * MR job progress when namenodes start failing There is also performance testing. Paper to read: http://googletesting.blogspot.com/2008/05/performance-testing-of-distributed-file.html > Large-scale reliability tests > ----------------------------- > > Key: HADOOP-2483 > URL: https://issues.apache.org/jira/browse/HADOOP-2483 > Project: Hadoop Core > Issue Type: Test > Components: mapred > Reporter: Arun C Murthy > Assignee: Devaraj Das > Fix For: 0.20.0 > > > The fact that we do not have any large-scale reliability tests bothers me. > I'll be first to admit that it isn't the easiest of tasks, but I'd like to > start a discussion around this... especially given that the code-base is > growing to an extent that interactions due to small changes are very hard to > predict. > One of the simple scripts I run for every patch I work on does something very > simple: run sort500 (or greater), then it randomly picks n tasktrackers from > ${HADOOP_CONF_DIR}/conf/slaves and then kills them, a similar script one > kills and restarts the tasktrackers. > This helps in checking a fair number of reliability stories: lost > tasktrackers, task-failures etc. Clearly this isn't good enough to cover > everything, but a start. > Lets discuss - What do we do for HDFS? We need more for Map-Reduce! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.