[ https://issues.apache.org/jira/browse/MAPREDUCE-7076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated MAPREDUCE-7076: -------------------------------------- Labels: newbie pull-request-available (was: newbie) > TestNNBench#testNNBenchCreateReadAndDelete failing in our internal build > ------------------------------------------------------------------------ > > Key: MAPREDUCE-7076 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7076 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test > Affects Versions: 2.8.0 > Reporter: Rushabh Shah > Assignee: Kevin Su > Priority: Minor > Labels: newbie, pull-request-available > Fix For: 2.10.0, 3.3.0, 3.2.1, 3.1.3 > > > TestNNBench#testNNBenchCreateReadAndDelete failed couple of times in our > internal jenkins build. > {noformat} > java.lang.AssertionError: create_write should create the file > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.TestNNBench.testNNBenchCreateReadAndDelete(TestNNBench.java:55) > {noformat} > Below is my analysis for why it didn't create the file. > {code:java|title=NNBench.java|borderStyle=solid} > // Some comments here > public void map(Text key, > LongWritable value, > OutputCollector<Text, Text> output, > Reporter reporter) throws IOException { > if (barrier()) { > String fileName = "file_" + value; > if (op.equals(OP_CREATE_WRITE)) { > startTimeTPmS = System.currentTimeMillis(); > doCreateWriteOp(fileName, reporter); > } ... > } else { > output.collect(new Text("l:latemaps"), new Text("1")); > } > // Below are the relevant parts of barrier() method > private boolean barrier() { > .. > // If the sleep time is greater than 0, then sleep and return > ... > LOG.info("Waiting in barrier for: " + sleepTime + " ms"); > return retVal; > } > // Below are the relevant parts of the doCreateWriteOp > private void doCreateWriteOp(String name, > Reporter reporter) { > FSDataOutputStream out; > byte[] buffer = new byte[bytesToWrite]; > for (long l = 0l; l < numberOfFiles; l++) { > Path filePath = new Path(new Path(baseDir, dataDirName), > name + "_" + l); > } > .... > } > {code} > This file {{BASE_DIR/data/file_0_0}} is getting created only if the map task > starts before the time mentioned by {{startTime}}. > Refer the chunk which I pasted above. > {{map(..)}} --> {{barrier()}} and *only if* {{barrier()}} evaluates to true > it will call {{doCreateWriteOp}} which will eventually create the file. > In test case, the delay value is 3 seconds as per {{"-startTime", "" + > (Time.now() / 1000 + 3)}} > In this failing test case, I can see the task starting minimum 6 seconds > after the test case started. > {noformat} > 2017-01-27 03:11:15,387 INFO [Thread-4] mapreduce.JobSubmitter > (JobSubmitter.java:printTokens(289)) - Submitting tokens for job: > job_local1711545156_0001 > 2017-01-27 03:11:23,405 INFO [Thread-4] mapreduce.Job > (Job.java:submit(1345)) - The url to track the job: http://localhost:8080/ > {noformat} > Also when I run this test on my laptop, I see the following line being > printed. > {noformat} > 2017-01-27 17:09:27,982 INFO [LocalJobRunner Map Task Executor #0] > hdfs.NNBench (NNBench.java:barrier(676)) - Waiting in barrier for: 1018 ms > {noformat} > This line will be printed only in {{barrier()}} method and I don't see this > line in the logs of failed test. > > In our environment, the jenkins server was very slow and it took more than 6 > seconds to launch a map task. > The correct fix in my opinion would be to return true in case there is no > sleep in {{barrier() method}}. Only in exception, it should return false. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org