Hi Yixu, Thanks for the ddl. But i am particularly interested on the data in order to reproduce the problem. Will it be feasible to share to data or if it is huge then the scripts that generates it? I cannot access the data location from public network.
On Thu, Oct 19, 2017 at 3:00 PM, yixu2001 <[email protected]> wrote: > dev > > Step 1:I make a hive table qqdata2.h_indextest1: > CREATE EXTERNAL TABLE `qqdata2.h_indextest1`( > `id` INT, > `CUST_ORDER_ID` STRING, > `ORDER_ITEM_IDATTR_ID` STRING, > `ATTR_VALUE_IDATTR_VALUE` STRING, > `CREATE_DATE` STRING, > `UPDATE_DATE` STRING, > `STATUS_CD` STRING, > `STATUS_DATE` STRING, > `AREA_ID` STRING, > `REGION_CD` STRING, > `UPDATE_STAFF` STRING, > `CREATE_STAFF` STRING, > `SHARDING_ID` STRING, > `ORDER_ATTR_ID` STRING ) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\u0001' > LINES TERMINATED BY '\n' > NULL DEFINED AS '' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io > > .HiveIgnoreKeyTextOutputFormat' > LOCATION > 'hdfs://hdp78.ffcs.cn:8020/user/bigdata/streamcql/dist1 > > '; > > there are csv files in hdfs dir user/bigdata/streamcql/dist1, the data > format in the csv file just as following > > 1939505130,171483932,287305502,813463930,20160709134396669, > 201607101469099594,1299,20160711996575390,10,73,302063,302064,127859875, > 9999999 > > Step 2:I make a carbon table qqdata2.c_indextest1: > cc.sql("CREATE TABLE IF NOT EXISTS qqdata2.c_indextest1 (id STRING, > CUST_ORDER_ID STRING,ORDER_ITEM_IDATTR_ID STRING,ATTR_VALUE_IDATTR_VALUE > STRING,CREATE_DATE STRING,UPDATE_DATE STRING,STATUS_CD STRING,STATUS_DATE > STRING,AREA_ID STRING,REGION_CD STRING,UPDATE_STAFF STRING,CREATE_STAFF > STRING,SHARDING_ID STRING,ORDER_ATTR_ID STRING) STORED BY 'carbondata' ") > > Step 3:Insert data: > cc.sql("insert into qqdata2.c_indextest1 select * from > qqdata2.h_indextest1").show(100,false); > > Step 4: Repeat from step1 to step 3, I make another carbon table > qqdata2.c_indextest2 > The record number of qqdata2.c_indextest1 is 30w, the record number of > qqdata2.c_indextest2 is 1700w. > > > > yixu2001 > > From: sounak > Date: 2017-10-19 14:13 > To: dev > Subject: Re: Update statement failed with "Multiple input rows matched for > same row" in version 1.2.0, > Hi Yixu, > > Can you please share the DDLs and the Data for the above problem with us? > > Thanks > Sounak > > On Wed, Oct 18, 2017 at 12:44 PM, yixu2001 <[email protected]> wrote: > > > dev > > > > > > In carbondata version 1.2.0, I execute "update" statement with sub-query, > > it failed. > > All the rows in the 2 tables are not duplicated, and the same statement > > will succeed in carbondata version 1.1.1. > > > > The test log as following: > > scala> cc.sql("select count(*), count(distinct id) from > > qqdata2.c_indextest1").show(100,false); > > +--------+------------------+ > > |count(1)|count(DISTINCT id)| > > +--------+------------------+ > > |300000 |300000 | > > +--------+------------------+ > > > > scala> cc.sql("select count(*), count(distinct id) from > > qqdata2.c_indextest2").show(100,false); > > +--------+------------------+ > > |count(1)|count(DISTINCT id)| > > +--------+------------------+ > > |71223220|71223220 | > > +--------+------------------+ > > > > scala> cc.sql("update qqdata2.c_indextest2 a set(a.CUST_ORDER_ID,a.ORDER_ > > ITEM_IDATTR_ID,a.ATTR_VALUE_IDATTR_VALUE,a.CREATE_DATE,a. > > UPDATE_DATE,a.STATUS_CD,a.STATUS_DATE,a.AREA_ID,a. > > REGION_CD,a.UPDATE_STAFF,a.CREATE_STAFF,a.SHARDING_ID,a.ORDER_ATTR_ID) = > > (select b.CUST_ORDER_ID,b.ORDER_ITEM_IDATTR_ID,b.ATTR_VALUE_IDATTR_ > > VALUE,b.CREATE_DATE,b.UPDATE_DATE,b.STATUS_CD,b.STATUS_ > > DATE,b.AREA_ID,b.REGION_CD,b.UPDATE_STAFF,b.CREATE_STAFF,b. > SHARDING_ID,b.ORDER_ATTR_ID > > from qqdata2.c_indextest1 b where a.id = b.id)").show(100,false); > > 17/10/18 11:32:46 WARN Utils: Truncated the string representation of a > > plan since it was too large. This behavior can be adjusted by setting > > 'spark.debug.maxToStringFields' in SparkEnv.conf. > > 17/10/18 11:33:20 AUDIT deleteExecution$: [hdp84.ffcs.cn > > > > ][bigdata][Thread-1]Delete data operation is failed for > > qqdata2.c_indextest2 > > 17/10/18 11:33:20 ERROR deleteExecution$: main Delete data operation is > > failed due to failure in creating delete delta file for segment : null > > block : null > > 17/10/18 11:33:20 ERROR ProjectForUpdateCommand$: main Exception in > update > > operationjava.lang.Exception: Multiple input rows matched for same row. > > java.lang.RuntimeException: Update operation failed. Multiple input rows > > matched for same row. > > at scala.sys.package$.error(package.scala:27) > > at org.apache.spark.sql.execution.command.ProjectForUpdateCommand. > > processData(IUDCommands.scala:239) > > at org.apache.spark.sql.execution.command.ProjectForUpdateCommand.run( > > IUDCommands.scala:141) > > at org.apache.spark.sql.execution.command.ExecutedCommandExec. > > sideEffectResult$lzycompute(commands.scala:58) > > at org.apache.spark.sql.execution.command.ExecutedCommandExec. > > sideEffectResult(commands.scala:56) > > at org.apache.spark.sql.execution.command.ExecutedCommandExec. > > executeTake(commands.scala:71) > > at org.apache.spark.sql.execution.CollectLimitExec. > > executeCollect(limit.scala:38) > > at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$ > > Dataset$$execute$1$1.apply(Dataset.scala:2378) > > at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId( > > SQLExecution.scala:57) > > at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780) > > at org.apache.spark.sql.Dataset.org > > > > $apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377) > > at org.apache.spark.sql.Dataset.org > > > > $apache$spark$sql$Dataset$$collect(Dataset.scala:2384) > > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset. > > scala:2120) > > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset. > > scala:2119) > > at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2810) > > at org.apache.spark.sql.Dataset.head(Dataset.scala:2119) > > at org.apache.spark.sql.Dataset.take(Dataset.scala:2334) > > at org.apache.spark.sql.Dataset.showString(Dataset.scala:248) > > at org.apache.spark.sql.Dataset.show(Dataset.scala:640) > > ... 50 elided > > > > > > > > yixu2001 > > > > > > -- > Thanks > Sounak > -- Thanks Sounak
