xubo245 created CARBONDATA-2385:
-----------------------------------
Summary: The result is incorrect when read data from carbonfile
generated by SDK
Key: CARBONDATA-2385
URL: https://issues.apache.org/jira/browse/CARBONDATA-2385
Project: CarbonData
Issue Type: Bug
Reporter: xubo245
Assignee: xubo245
The result is incorrect when read data from carbonfile generated by SDK
When generate 10 million rows data by
org.apache.carbondata.spark.testsuite.createTable.TestCreateTableUsingSparkCarbonFileFormat
and count is 5888000
{code:java}
18/04/23 01:43:12 INFO SessionState: Created HDFS directory:
/tmp/hive/root/6ebdb24c-8b92-45c3-b7c0-639da93c2984/_tmp_space.db
18/04/23 01:43:12 INFO HiveClientImpl: Warehouse location for Hive client
(version 1.2.1) is
/huawei/xubo/git/carbondata1/integration/spark-common/target/warehouse
18/04/23 01:43:12 INFO StateStoreCoordinatorRef: Registered
StateStoreCoordinator endpoint
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+
|col_name |data_type
|comment|
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+
|name |string
|null |
|age |int
|null |
|height |double
|null |
| |
| |
|# Detailed Table Information|
| |
|Database |default
| |
|Table |sdkoutputtable
| |
|Owner |root
| |
|Created |Mon Apr 23 01:43:19 PDT 2018
| |
|Last Access |Wed Dec 31 16:00:00 PST 1969
| |
|Type |EXTERNAL
| |
|Provider |carbonfile
| |
|Table Properties |[transient_lastDdlTime=1524472999]
| |
|Location
|file:/huawei/xubo/git/carbondata1/integration/spark-common-test/src/test/resources/SparkCarbonFileFormat/WriterOutput/Fact/Part0/Segment_null|
|
|Serde Library
|org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
| |
|InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat
| |
|OutputFormat
|org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
| |
|Storage Properties |[serialization.format=1]
| |
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+
+-------+---+------+
|name |age|height|
+-------+---+------+
|robot0 |0 |0.0 |
|robot1 |1 |0.5 |
|robot2 |2 |1.0 |
|robot3 |3 |1.5 |
|robot4 |4 |2.0 |
|robot5 |5 |2.5 |
|robot6 |6 |3.0 |
|robot7 |7 |3.5 |
|robot8 |8 |4.0 |
|robot9 |9 |4.5 |
|robot10|10 |5.0 |
|robot11|11 |5.5 |
|robot12|12 |6.0 |
|robot13|13 |6.5 |
|robot14|14 |7.0 |
|robot15|15 |7.5 |
|robot16|16 |8.0 |
|robot17|17 |8.5 |
|robot18|18 |9.0 |
|robot19|19 |9.5 |
+-------+---+------+
only showing top 20 rows
+------+---+------+
|name |age|height|
+------+---+------+
|robot0|0 |0.0 |
|robot1|1 |0.5 |
|robot2|2 |1.0 |
+------+---+------+
+-------+
|name |
+-------+
|robot0 |
|robot1 |
|robot2 |
|robot3 |
|robot4 |
|robot5 |
|robot6 |
|robot7 |
|robot8 |
|robot9 |
|robot10|
|robot11|
|robot12|
|robot13|
|robot14|
|robot15|
|robot16|
|robot17|
|robot18|
|robot19|
+-------+
only showing top 20 rows
+---+
|age|
+---+
|0 |
|1 |
|2 |
|3 |
|4 |
|5 |
|6 |
|7 |
|8 |
|9 |
|10 |
|11 |
|12 |
|13 |
|14 |
|15 |
|16 |
|17 |
|18 |
|19 |
+---+
only showing top 20 rows
+------+---+------+
|name |age|height|
+------+---+------+
|robot3|3 |1.5 |
|robot4|4 |2.0 |
|robot5|5 |2.5 |
|robot6|6 |3.0 |
|robot7|7 |3.5 |
+------+---+------+
+------+---+------+
|name |age|height|
+------+---+------+
|robot3|3 |1.5 |
+------+---+------+
+------+---+------+
|name |age|height|
+------+---+------+
|robot0|0 |0.0 |
|robot1|1 |0.5 |
|robot2|2 |1.0 |
|robot3|3 |1.5 |
|robot4|4 |2.0 |
+------+---+------+
+------+---+------+
|name |age|height|
+------+---+------+
|robot0|0 |0.0 |
|robot1|1 |0.5 |
+------+---+------+
+-------------+
|sum(age) |
+-------------+
|1515150959596|
+-------------+
+--------+
|count(1)|
+--------+
|5888000 |
+--------+
+--------+
|count(1)|
+--------+
|5888000 |
+--------+
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+
|col_name |data_type
|comment|
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+
|name |string
|null |
|age |int
|null |
|height |double
|null |
| |
| |
|# Detailed Table Information|
| |
|Database |default
| |
|Table |sdkoutputtable
| |
|Owner |root
| |
|Created |Mon Apr 23 01:43:47 PDT 2018
| |
|Last Access |Wed Dec 31 16:00:00 PST 1969
| |
|Type |EXTERNAL
| |
|Provider |carbonfile
| |
|Table Properties |[transient_lastDdlTime=1524473027]
| |
|Location
|file:/huawei/xubo/git/carbondata1/integration/spark-common-test/src/test/resources/SparkCarbonFileFormat/WriterOutput/Fact/Part0/Segment_null|
|
|Serde Library
|org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
| |
|InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat
| |
|OutputFormat
|org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
| |
|Storage Properties |[serialization.format=1]
| |
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+
+-------+---+------+
|name |age|height|
+-------+---+------+
|robot0 |0 |0.0 |
|robot1 |1 |0.5 |
|robot2 |2 |1.0 |
|robot3 |3 |1.5 |
|robot4 |4 |2.0 |
|robot5 |5 |2.5 |
|robot6 |6 |3.0 |
|robot7 |7 |3.5 |
|robot8 |8 |4.0 |
|robot9 |9 |4.5 |
|robot10|10 |5.0 |
|robot11|11 |5.5 |
|robot12|12 |6.0 |
|robot13|13 |6.5 |
|robot14|14 |7.0 |
|robot15|15 |7.5 |
|robot16|16 |8.0 |
|robot17|17 |8.5 |
|robot18|18 |9.0 |
|robot19|19 |9.5 |
+-------+---+------+
only showing top 20 rows
+------+---+------+
|name |age|height|
+------+---+------+
|robot0|0 |0.0 |
|robot1|1 |0.5 |
|robot2|2 |1.0 |
+------+---+------+
+-------+
|name |
+-------+
|robot0 |
|robot1 |
|robot2 |
|robot3 |
|robot4 |
|robot5 |
|robot6 |
|robot7 |
|robot8 |
|robot9 |
|robot10|
|robot11|
|robot12|
|robot13|
|robot14|
|robot15|
|robot16|
|robot17|
|robot18|
|robot19|
+-------+
only showing top 20 rows
+---+
|age|
+---+
|0 |
|1 |
|2 |
|3 |
|4 |
|5 |
|6 |
|7 |
|8 |
|9 |
|10 |
|11 |
|12 |
|13 |
|14 |
|15 |
|16 |
|17 |
|18 |
|19 |
+---+
only showing top 20 rows
+------+---+------+
|name |age|height|
+------+---+------+
|robot3|3 |1.5 |
|robot4|4 |2.0 |
|robot5|5 |2.5 |
|robot6|6 |3.0 |
|robot7|7 |3.5 |
+------+---+------+
+------+---+------+
|name |age|height|
+------+---+------+
|robot3|3 |1.5 |
+------+---+------+
+------+---+------+
|name |age|height|
+------+---+------+
|robot0|0 |0.0 |
|robot1|1 |0.5 |
|robot2|2 |1.0 |
|robot3|3 |1.5 |
|robot4|4 |2.0 |
+------+---+------+
+------+---+------+
|name |age|height|
+------+---+------+
|robot0|0 |0.0 |
|robot1|1 |0.5 |
+------+---+------+
+-------------+
|sum(age) |
+-------------+
|1515150959596|
+-------------+
+--------+
|count(1)|
+--------+
|5888000 |
+--------+
+--------+
|count(1)|
+--------+
|5888000 |
+--------+
18/04/23 01:43:56 ERROR Executor: Exception in task 0.0 in stage 32.0 (TID 38)
org.apache.spark.SparkException: Index file not present to read the carbondata
file
at
org.apache.spark.sql.SparkCarbonFileFormat$$anonfun$buildReaderWithPartitionValues$2.apply(SparkCarbonFileFormat.scala:231)
at
org.apache.spark.sql.SparkCarbonFileFormat$$anonfun$buildReaderWithPartitionValues$2.apply(SparkCarbonFileFormat.scala:188)
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124)
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:174)
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown
Source)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/04/23 01:43:56 ERROR TaskSetManager: Task 0 in stage 32.0 failed 1 times;
aborting job
Process finished with exit code 0
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)