Hi, The method readAllFootersInParallel is implemented in Parquet's ParquetFileReader. So the spark config "spark.sql.files.ignoreCorruptFiles" doesn't work for it.
Reading all footers in parallel can speed up the task. However, we can't control if ignoring corrupt files or not. Of course we can read this footers in sequence and ignore the corrupt ones. But it might be inefficient. Since this is a relatively corner use case, I don't expect we can have this. Maybe Parquet can implement an option to ignore corrupt files. However, even so, it can't be expected to have this updated Parquet implementation available to Spark very soon. khyati wrote > Hi Reynold Xin, > > In spark 2.1.0, > I tried setting spark.sql.files.ignoreCorruptFiles = true by using > commands, > > val sqlContext =new org.apache.spark.sql.hive.HiveContext(sc) > > sqlContext.setConf("spark.sql.files.ignoreCorruptFiles","true") / > sqlContext.sql("set spark.sql.files.ignoreCorruptFiles=true") > > but still getting error while reading parquet files using > val newDataDF = > sqlContext.read.parquet("/data/tempparquetdata/corruptblock.0","/data/tempparquetdata/data1.parquet") > > Error: ERROR executor.Executor: Exception in task 0.0 in stage 4.0 (TID 4) > java.io.IOException: Could not read footer: java.lang.RuntimeException: > hdfs://192.168.1.53:9000/data/tempparquetdata/corruptblock.0 is not a > Parquet file. expected magic number at tail [80, 65, 82, 49] but found > [65, 82, 49, 10] > at > org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:248) > > > Please let me know if I am missing anything. ----- Liang-Chi Hsieh | @viirya Spark Technology Center http://www.spark.tc/ -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Skip-Corrupted-Parquet-blocks-footer-tp20418p20450.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org