Hi, I am using a Hadoop cluster of my own construction on EC2, and I am running out of hard drive space with maps. If I knew which directories are used by Hadoop for map spill, I could use the large ephemeral drive on EC2 machines for that. Otherwise, I would have to keep increasing my available hard drive on root, and that's not very smart.
Thank you. The error I get is below. Sincerely, Mark org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/file.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:376) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127) at org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1495) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1180) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:582) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:649) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs java.io.IOException: Spill failed at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:886) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:574) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.freeeed.main.ZipFileProcessor.emitAsMap(ZipFileProcessor.java:279) at org.freeeed.main.ZipFileProcessor.processWithTrueZip(ZipFileProcessor.java:107) at org.freeeed.main.ZipFileProcessor.process(ZipFileProcessor.java:55) at org.freeeed.main.Map.map(Map.java:70) at org.freeeed.main.Map.map(Map.java:24) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(User java.io.IOException: Spill failed at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:886) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:574) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.freeeed.main.ZipFileProcessor.emitAsMap(ZipFileProcessor.java:279) at org.freeeed.main.ZipFileProcessor.processWithTrueZip(ZipFileProcessor.java:107) at org.freeeed.main.ZipFileProcessor.process(ZipFileProcessor.java:55) at org.freeeed.main.Map.map(Map.java:70) at org.freeeed.main.Map.map(Map.java:24) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(User org.apache.hadoop.io.SecureIOUtils$AlreadyExistsException: EEXIST: File exists at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:178) at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:292) at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:365) at org.apache.hadoop.mapred.Child$4.run(Child.java:272) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: EEXIST: File exists at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method) at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:172) ... 7 more