dev This issue has caused great trouble for our production. I will appreciate if you have any plan to fix it and let me know.
yixu2001 From: BabuLal Date: 2018-03-23 00:20 To: dev Subject: Re: Getting [Problem in loading segment blocks] error after doing multi update operations hi all i am able to reproduce same exception in my cluster and got the same exception. (Trace is listed below) ------ scala> carbon.sql("select count(*) from public.c_compact4").show 2018-03-22 20:40:33,105 | WARN | main | main spark.sql.sources.options.keys expected, but read nothing | org.apache.carbondata.common.logging.impl.StandardLogService.logWarnMessage(StandardLogService.java:168) org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange SinglePartition +- *HashAggregate(keys=[], functions=[partial_count(1)], output=[count#1443L]) +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :public, Table name :c_compact4, Schema :Some(StructType(StructField(id,StringType,true), StructField(qqnum,StringType,true), StructField(nick,StringType,true), StructField(age,StringType,true), StructField(gender,StringType,true), StructField(auth,StringType,true), StructField(qunnum,StringType,true), StructField(mvcc,StringType,true))) ] public.c_compact4[] at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) at org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:112) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:235) at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:141) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:372) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114 at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135 at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:225) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:308) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:113) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2386) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2392) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2128 at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2127) at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2818) at org.apache.spark.sql.Dataset.head(Dataset.scala:2127) at org.apache.spark.sql.Dataset.take(Dataset.scala:2342) at org.apache.spark.sql.Dataset.showString(Dataset.scala:248) at org.apache.spark.sql.Dataset.show(Dataset.scala:638) at org.apache.spark.sql.Dataset.show(Dataset.scala:597) at org.apache.spark.sql.Dataset.show(Dataset.scala:606) ... 48 elided Caused by: java.io.IOException: Problem in loading segment blocks. at org.apache.carbondata.core.indexstore.BlockletDataMapIndexStore.getAll(BlockletDataMapIndexStore.java:153) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getDataMaps(BlockletDataMapFactory.java:76) at org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:72) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getDataBlocksOfSegment(CarbonTableInputFormat.java:739 at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:666) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:426) at org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:107) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121 at org.apache.spark.rdd.RDD.partitions(RDD.scala:251 at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91) at org.apache.spark.sql.execution.exchange.ShuffleExchange$.prepareShuffleDependency(ShuffleExchange.scala:273) at org.apache.spark.sql.execution.exchange.ShuffleExchange.prepareShuffleDependency(ShuffleExchange.scala:84) at org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:121) at org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:112) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) ... 81 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getLocations(AbstractDFSCarbonFile.java:509) at org.apache.carbondata.core.indexstore.BlockletDataMapIndexStore.getAll(BlockletDataMapIndexStore.java:142) ----------------Store location---- ---- linux-49:/opt/babu # hadoop fs -ls /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/*.deletedelta -rw-rw-r--+ 3 hdfs hive 177216 2018-03-22 18:20 /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/part-0-0_batchno0-0-1521723019528.deletedelta -rw-r--r-- 3 hdfs hive 0 2018-03-22 19:35 /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/part-0-0_batchno0-0-1521723886214.deletedelta -rw-rw-r--+ 3 hdfs hive 87989 2018-03-22 18:20 /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/part-0-1_batchno0-0-1521723019528.deletedelta -rw-r--r-- 3 hdfs hive 0 2018-03-22 19:35 /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/part-0-1_batchno0-0-1521723886214.deletedelta -rw-rw-r--+ 3 hdfs hive 87989 2018-03-22 18:20 /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/part-0-2_batchno0-0-1521723019528.deletedelta -rw-r--r-- 3 hdfs hive 0 2018-03-22 19:35 /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/part-0-2_batchno0-0-1521723886214.deletedelta ----------------------------------------------------------- Issue reproduced technique :- Writing content of delete delta is failed but deletedelta file created successfully . Failed during Horizontal Compaction ( added setSpaceQuota in hdfs so that file can created successfully and write to this file is failed) *Below points to be handled to fix this issue.* 1. When Horizontal Compaction is failed 0 byte delete delta file should be deleted currently it is not deleted. This is a cleaning part of the Horizontal Compaction fail . 2. delete delta of 0 byte should not be considered while reading .( we can further discuss about this solution ) . currently tablestatus file has the entry of deletedelta timestamp. 3. If deleting is in progress , file is created (name node has entry of file) but data writing is in progress (not yet flush) but at same time select query is triggered ,then Query will failed so this scenario also need to handle. @dev :- Please Let me know if any other detail is needed. Thanks Babu -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/