Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

sounak Thu, 19 Oct 2017 03:03:44 -0700

Hi Yixu,

Thanks for the ddl. But i am particularly interested on the data in order
to reproduce the problem. Will it be feasible to share to data or if it is
huge then the scripts that generates it? I cannot access the data location
from public network.



On Thu, Oct 19, 2017 at 3:00 PM, yixu2001 <[email protected]> wrote:

> dev
>
> Step 1:I make a hive table qqdata2.h_indextest1:
> CREATE EXTERNAL TABLE `qqdata2.h_indextest1`(
> `id` INT,
> `CUST_ORDER_ID` STRING,
> `ORDER_ITEM_IDATTR_ID` STRING,
> `ATTR_VALUE_IDATTR_VALUE` STRING,
> `CREATE_DATE` STRING,
> `UPDATE_DATE` STRING,
> `STATUS_CD` STRING,
> `STATUS_DATE` STRING,
> `AREA_ID` STRING,
> `REGION_CD` STRING,
> `UPDATE_STAFF` STRING,
> `CREATE_STAFF` STRING,
> `SHARDING_ID` STRING,
> `ORDER_ATTR_ID` STRING )
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\u0001'
>   LINES TERMINATED BY '\n'
>   NULL DEFINED AS ''
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io
>
> .HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://hdp78.ffcs.cn:8020/user/bigdata/streamcql/dist1
>
> ';
>
> there are csv files in hdfs dir user/bigdata/streamcql/dist1, the data
> format in the csv file just as following
>
> 1939505130,171483932,287305502,813463930,20160709134396669,
> 201607101469099594,1299,20160711996575390,10,73,302063,302064,127859875,
> 9999999
>
> Step 2:I make a carbon table qqdata2.c_indextest1:
> cc.sql("CREATE TABLE IF NOT EXISTS qqdata2.c_indextest1 (id STRING,
> CUST_ORDER_ID STRING,ORDER_ITEM_IDATTR_ID STRING,ATTR_VALUE_IDATTR_VALUE
> STRING,CREATE_DATE STRING,UPDATE_DATE STRING,STATUS_CD STRING,STATUS_DATE
> STRING,AREA_ID STRING,REGION_CD STRING,UPDATE_STAFF STRING,CREATE_STAFF
> STRING,SHARDING_ID STRING,ORDER_ATTR_ID STRING) STORED BY 'carbondata' ")
>
> Step 3:Insert data:
> cc.sql("insert into qqdata2.c_indextest1 select * from
> qqdata2.h_indextest1").show(100,false);
>
> Step 4: Repeat from step1 to step 3, I make another carbon table
> qqdata2.c_indextest2
> The record number of qqdata2.c_indextest1 is 30w, the record number of
> qqdata2.c_indextest2 is 1700w.
>
>
>
> yixu2001
>
> From: sounak
> Date: 2017-10-19 14:13
> To: dev
> Subject: Re: Update statement failed with "Multiple input rows matched for
> same row" in version 1.2.0,
> Hi Yixu,
>
> Can you please share the DDLs and the Data for the above problem with us?
>
> Thanks
> Sounak
>
> On Wed, Oct 18, 2017 at 12:44 PM, yixu2001 <[email protected]> wrote:
>
> > dev
> >
> >
> > In carbondata version 1.2.0, I execute "update" statement with sub-query,
> > it failed.
> > All the rows in the 2 tables are not duplicated, and the same statement
> > will succeed in carbondata version 1.1.1.
> >
> > The test log as following:
> > scala> cc.sql("select count(*), count(distinct id) from
> > qqdata2.c_indextest1").show(100,false);
> > +--------+------------------+
> > |count(1)|count(DISTINCT id)|
> > +--------+------------------+
> > |300000  |300000            |
> > +--------+------------------+
> >
> > scala> cc.sql("select count(*), count(distinct id) from
> > qqdata2.c_indextest2").show(100,false);
> > +--------+------------------+
> > |count(1)|count(DISTINCT id)|
> > +--------+------------------+
> > |71223220|71223220          |
> > +--------+------------------+
> >
> > scala> cc.sql("update qqdata2.c_indextest2 a set(a.CUST_ORDER_ID,a.ORDER_
> > ITEM_IDATTR_ID,a.ATTR_VALUE_IDATTR_VALUE,a.CREATE_DATE,a.
> > UPDATE_DATE,a.STATUS_CD,a.STATUS_DATE,a.AREA_ID,a.
> > REGION_CD,a.UPDATE_STAFF,a.CREATE_STAFF,a.SHARDING_ID,a.ORDER_ATTR_ID) =
> > (select b.CUST_ORDER_ID,b.ORDER_ITEM_IDATTR_ID,b.ATTR_VALUE_IDATTR_
> > VALUE,b.CREATE_DATE,b.UPDATE_DATE,b.STATUS_CD,b.STATUS_
> > DATE,b.AREA_ID,b.REGION_CD,b.UPDATE_STAFF,b.CREATE_STAFF,b.
> SHARDING_ID,b.ORDER_ATTR_ID
> > from qqdata2.c_indextest1 b where a.id = b.id)").show(100,false);
> > 17/10/18 11:32:46 WARN Utils: Truncated the string representation of a
> > plan since it was too large. This behavior can be adjusted by setting
> > 'spark.debug.maxToStringFields' in SparkEnv.conf.
> > 17/10/18 11:33:20 AUDIT deleteExecution$: [hdp84.ffcs.cn
> >
> > ][bigdata][Thread-1]Delete data operation is failed for
> > qqdata2.c_indextest2
> > 17/10/18 11:33:20 ERROR deleteExecution$: main Delete data operation is
> > failed due to failure in creating delete delta file for segment : null
> > block : null
> > 17/10/18 11:33:20 ERROR ProjectForUpdateCommand$: main Exception in
> update
> > operationjava.lang.Exception: Multiple input rows matched for same row.
> > java.lang.RuntimeException: Update operation failed. Multiple input rows
> > matched for same row.
> >   at scala.sys.package$.error(package.scala:27)
> >   at org.apache.spark.sql.execution.command.ProjectForUpdateCommand.
> > processData(IUDCommands.scala:239)
> >   at org.apache.spark.sql.execution.command.ProjectForUpdateCommand.run(
> > IUDCommands.scala:141)
> >   at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> > sideEffectResult$lzycompute(commands.scala:58)
> >   at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> > sideEffectResult(commands.scala:56)
> >   at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> > executeTake(commands.scala:71)
> >   at org.apache.spark.sql.execution.CollectLimitExec.
> > executeCollect(limit.scala:38)
> >   at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$
> > Dataset$$execute$1$1.apply(Dataset.scala:2378)
> >   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(
> > SQLExecution.scala:57)
> >   at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780)
> >   at org.apache.spark.sql.Dataset.org
> >
> > $apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377)
> >   at org.apache.spark.sql.Dataset.org
> >
> > $apache$spark$sql$Dataset$$collect(Dataset.scala:2384)
> >   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
> > scala:2120)
> >   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
> > scala:2119)
> >   at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2810)
> >   at org.apache.spark.sql.Dataset.head(Dataset.scala:2119)
> >   at org.apache.spark.sql.Dataset.take(Dataset.scala:2334)
> >   at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
> >   at org.apache.spark.sql.Dataset.show(Dataset.scala:640)
> >   ... 50 elided
> >
> >
> >
> > yixu2001
> >
>
>
>
> --
> Thanks
> Sounak
>



-- 
Thanks
Sounak

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

Reply via email to