[ https://issues.apache.org/jira/browse/MNEMONIC-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16248268#comment-16248268 ]
Wang, Gang edited comment on MNEMONIC-399 at 11/11/17 1:47 AM: --------------------------------------------------------------- This bug loos have been fixed, need to be further verified by bigger datasets The reason looks that the HadoopRDD's iterator cannot be exhausted inside of RDD.compute() aggressively. was (Author: qichfan): This bug loos have been fixed, need to be further verified by bigger datasets The reason looks that the HadoopRDD's iterator cannot be exhausted inside of RDD.compute() > Hadoop.fs thrown Bad file descriptor exception > ----------------------------------------------- > > Key: MNEMONIC-399 > URL: https://issues.apache.org/jira/browse/MNEMONIC-399 > Project: Mnemonic > Issue Type: Bug > Components: Spark-Integration > Affects Versions: 0.10.0-incubating > Reporter: Wang, Gang > Assignee: Wang, Gang > > The implementation of DurableRDD causes the following exception when the > number of RDD partitions is greater than 8 according to my tests, it is not > clear what the root cause for this issue. > {quote} > org.apache.hadoop.fs.FSError: java.io.IOException: Bad file descriptor > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:161) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > at java.io.DataInputStream.read(DataInputStream.java:149) > at > org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:436) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:257) > at > org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:276) > at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:228) > at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:196) > at java.io.DataInputStream.read(DataInputStream.java:149) > at > org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62) > at > org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) > at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) > at > org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94) > at > org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:248) > at > org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48) > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:277) > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:214) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at > org.apache.mnemonic.spark.rdd.DurableRDD$.prepareDurablePartition(DurableRDD.scala:226) > at > org.apache.mnemonic.spark.rdd.DurableRDD.compute(DurableRDD.scala:88) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at > org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Bad file descriptor > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:154) > ... 41 more > {quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029)