Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

sounak Thu, 19 Oct 2017 19:28:32 -0700

Slight rectification in typo of the query syntax

" select a.remark,a.id, getTupleId() as *tupleId* from c_indextest1 a,
c_indextest2
b where a.id=b.id;"


On Fri, Oct 20, 2017 at 7:54 AM, sounak <[email protected]> wrote:

> Internally, we call a UDF to generate the TupleId and based on that
> tupleId we decide if the row is duplicated or not. You can run a slightly
> tweaked query
>
> " select a.remark,a.id, getTupleId() as TupleId from c_indextest1 a, 
> c_indextest2
> b where a.id=b.id; "
>
> In order to prompt Multiple Input Rows most probably the getTupleId() will
> give duplicate entries.
>
> In between your given query runs fine with the data i have generated in
> house. So this is data dependent. So it will be good if we get
> *origin_data* or the script which has build the data for origin_data.
>
> On Fri, Oct 20, 2017 at 7:28 AM, yixu2001 <[email protected]> wrote:
>
>> dev
>>    one row record
>>
>>
>> yixu2001
>>
>> From: Liang Chen
>> Date: 2017-10-19 22:26
>> To: dev
>> Subject: Re: Re: Update statement failed with "Multiple input rows
>> matched for same row" in version 1.2.0,
>> Hi
>>
>> Execute the below query, return one row record or multiple row records ?
>> ---------------------------------
>> select a.remark from  c_indextest1 a where a.id=b.id
>>
>> Regards
>> Liang
>>
>>
>> yixu2001 wrote
>> > dev
>> >  You can follow the steps below to reproduce the problem.
>> > tables c_indextest2 has 1700w records and table c_indextest1 has about
>> 30w
>> > records.
>> >
>> > step 1:
>> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest2 (id string ,remark
>> string)
>> > STORED BY 'carbondata'").show;
>> >
>> > step 2: origin_data is a existing table with 1700w records, and fields
>> in
>> > the table do not matter.
>> > cc.sql("insert into c_indextest2 select row_number() over(partition by
>> > a.PKid order by a.pkid) id,a.remark from  (SELECT '1' PKID,'dfsdd'
>> remark
>> > from origin_data limit 17000000) a").show;
>> >
>> > step 3:
>> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest1 (id string ,remark
>> string)
>> > STORED BY 'carbondata'").show;
>> >
>> > step 4:
>> > cc.sql("insert into c_indextest1 select * from  c_indextest2 where
>> > pmod(cast(ID as int),50)=43").show;
>> >
>> > setp 5:
>> > cc.sql("update c_indextest2 b set (b.remark)=(select a.remark from
>> > c_indextest1 a where a.id=b.id)").show;
>> >
>> >
>> > yixu2001
>> >
>> > From: sounak
>> > Date: 2017-10-19 18:26
>> > To: dev
>> > Subject: Re: Re: Update statement failed with "Multiple input rows
>> matched
>> > for same row" in version 1.2.0,
>> > Hi Yixu,
>> >
>> > Thanks for the ddl. But i am particularly interested on the data in
>> order
>> > to reproduce the problem. Will it be feasible to share to data or if it
>> is
>> > huge then the scripts that generates it? I cannot access the data
>> location
>> > from public network.
>> >
>> >
>> > On Thu, Oct 19, 2017 at 3:00 PM, yixu2001 &lt;
>>
>> > yixu2001@
>>
>> > &gt; wrote:
>> >
>> >> dev
>> >>
>> >> Step 1:I make a hive table qqdata2.h_indextest1:
>> >> CREATE EXTERNAL TABLE `qqdata2.h_indextest1`(
>> >> `id` INT,
>> >> `CUST_ORDER_ID` STRING,
>> >> `ORDER_ITEM_IDATTR_ID` STRING,
>> >> `ATTR_VALUE_IDATTR_VALUE` STRING,
>> >> `CREATE_DATE` STRING,
>> >> `UPDATE_DATE` STRING,
>> >> `STATUS_CD` STRING,
>> >> `STATUS_DATE` STRING,
>> >> `AREA_ID` STRING,
>> >> `REGION_CD` STRING,
>> >> `UPDATE_STAFF` STRING,
>> >> `CREATE_STAFF` STRING,
>> >> `SHARDING_ID` STRING,
>> >> `ORDER_ATTR_ID` STRING )
>> >> ROW FORMAT DELIMITED
>> >>   FIELDS TERMINATED BY '\u0001'
>> >>   LINES TERMINATED BY '\n'
>> >>   NULL DEFINED AS ''
>> >> STORED AS INPUTFORMAT
>> >>   'org.apache.hadoop.mapred.TextInputFormat'
>> >> OUTPUTFORMAT
>> >>   'org.apache.hadoop.hive.ql.io
>> >>
>> >> .HiveIgnoreKeyTextOutputFormat'
>> >> LOCATION
>> >>   'hdfs://hdp78.ffcs.cn:8020/user/bigdata/streamcql/dist1
>> >>
>> >> ';
>> >>
>> >> there are csv files in hdfs dir user/bigdata/streamcql/dist1, the data
>> >> format in the csv file just as following
>> >>
>> >> 1939505130,171483932,287305502,813463930,20160709134396669,
>> >> 201607101469099594,1299,20160711996575390,10,73,302063,
>> 302064,127859875,
>> >> 9999999
>> >>
>> >> Step 2:I make a carbon table qqdata2.c_indextest1:
>> >> cc.sql("CREATE TABLE IF NOT EXISTS qqdata2.c_indextest1 (id STRING,
>> >> CUST_ORDER_ID STRING,ORDER_ITEM_IDATTR_ID
>> STRING,ATTR_VALUE_IDATTR_VALUE
>> >> STRING,CREATE_DATE STRING,UPDATE_DATE STRING,STATUS_CD
>> STRING,STATUS_DATE
>> >> STRING,AREA_ID STRING,REGION_CD STRING,UPDATE_STAFF STRING,CREATE_STAFF
>> >> STRING,SHARDING_ID STRING,ORDER_ATTR_ID STRING) STORED BY 'carbondata'
>> ")
>> >>
>> >> Step 3:Insert data:
>> >> cc.sql("insert into qqdata2.c_indextest1 select * from
>> >> qqdata2.h_indextest1").show(100,false);
>> >>
>> >> Step 4: Repeat from step1 to step 3, I make another carbon table
>> >> qqdata2.c_indextest2
>> >> The record number of qqdata2.c_indextest1 is 30w, the record number of
>> >> qqdata2.c_indextest2 is 1700w.
>> >>
>> >>
>> >>
>> >> yixu2001
>> >>
>> >> From: sounak
>> >> Date: 2017-10-19 14:13
>> >> To: dev
>> >> Subject: Re: Update statement failed with "Multiple input rows matched
>> >> for
>> >> same row" in version 1.2.0,
>> >> Hi Yixu,
>> >>
>> >> Can you please share the DDLs and the Data for the above problem with
>> us?
>> >>
>> >> Thanks
>> >> Sounak
>> >>
>> >> On Wed, Oct 18, 2017 at 12:44 PM, yixu2001 &lt;
>>
>> > yixu2001@
>>
>> > &gt; wrote:
>> >>
>> >> > dev
>> >> >
>> >> >
>> >> > In carbondata version 1.2.0, I execute "update" statement with
>> >> sub-query,
>> >> > it failed.
>> >> > All the rows in the 2 tables are not duplicated, and the same
>> statement
>> >> > will succeed in carbondata version 1.1.1.
>> >> >
>> >> > The test log as following:
>> >> > scala> cc.sql("select count(*), count(distinct id) from
>> >> > qqdata2.c_indextest1").show(100,false);
>> >> > +--------+------------------+
>> >> > |count(1)|count(DISTINCT id)|
>> >> > +--------+------------------+
>> >> > |300000  |300000            |
>> >> > +--------+------------------+
>> >> >
>> >> > scala> cc.sql("select count(*), count(distinct id) from
>> >> > qqdata2.c_indextest2").show(100,false);
>> >> > +--------+------------------+
>> >> > |count(1)|count(DISTINCT id)|
>> >> > +--------+------------------+
>> >> > |71223220|71223220          |
>> >> > +--------+------------------+
>> >> >
>> >> > scala> cc.sql("update qqdata2.c_indextest2 a
>> >> set(a.CUST_ORDER_ID,a.ORDER_
>> >> > ITEM_IDATTR_ID,a.ATTR_VALUE_IDATTR_VALUE,a.CREATE_DATE,a.
>> >> > UPDATE_DATE,a.STATUS_CD,a.STATUS_DATE,a.AREA_ID,a.
>> >> > REGION_CD,a.UPDATE_STAFF,a.CREATE_STAFF,a.SHARDING_ID,a.ORDE
>> R_ATTR_ID)
>> >> =
>> >> > (select b.CUST_ORDER_ID,b.ORDER_ITEM_IDATTR_ID,b.ATTR_VALUE_IDATTR_
>> >> > VALUE,b.CREATE_DATE,b.UPDATE_DATE,b.STATUS_CD,b.STATUS_
>> >> > DATE,b.AREA_ID,b.REGION_CD,b.UPDATE_STAFF,b.CREATE_STAFF,b.
>> >> SHARDING_ID,b.ORDER_ATTR_ID
>> >> > from qqdata2.c_indextest1 b where a.id = b.id)").show(100,false);
>> >> > 17/10/18 11:32:46 WARN Utils: Truncated the string representation of
>> a
>> >> > plan since it was too large. This behavior can be adjusted by setting
>> >> > 'spark.debug.maxToStringFields' in SparkEnv.conf.
>> >> > 17/10/18 11:33:20 AUDIT deleteExecution$: [hdp84.ffcs.cn
>> >> >
>> >> > ][bigdata][Thread-1]Delete data operation is failed for
>> >> > qqdata2.c_indextest2
>> >> > 17/10/18 11:33:20 ERROR deleteExecution$: main Delete data operation
>> is
>> >> > failed due to failure in creating delete delta file for segment :
>> null
>> >> > block : null
>> >> > 17/10/18 11:33:20 ERROR ProjectForUpdateCommand$: main Exception in
>> >> update
>> >> > operationjava.lang.Exception: Multiple input rows matched for same
>> row.
>> >> > java.lang.RuntimeException: Update operation failed. Multiple input
>> >> rows
>> >> > matched for same row.
>> >> >   at scala.sys.package$.error(package.scala:27)
>> >> >   at org.apache.spark.sql.execution.command.ProjectForUpdateCommand.
>> >> > processData(IUDCommands.scala:239)
>> >> >   at
>> >> org.apache.spark.sql.execution.command.ProjectForUpdateCommand.run(
>> >> > IUDCommands.scala:141)
>> >> >   at org.apache.spark.sql.execution.command.ExecutedCommandExec.
>> >> > sideEffectResult$lzycompute(commands.scala:58)
>> >> >   at org.apache.spark.sql.execution.command.ExecutedCommandExec.
>> >> > sideEffectResult(commands.scala:56)
>> >> >   at org.apache.spark.sql.execution.command.ExecutedCommandExec.
>> >> > executeTake(commands.scala:71)
>> >> >   at org.apache.spark.sql.execution.CollectLimitExec.
>> >> > executeCollect(limit.scala:38)
>> >> >   at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$
>> >> > Dataset$$execute$1$1.apply(Dataset.scala:2378)
>> >> >   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
>> nId(
>> >> > SQLExecution.scala:57)
>> >> >   at
>> >> org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780)
>> >> >   at org.apache.spark.sql.Dataset.org
>> >> >
>> >> > $apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377)
>> >> >   at org.apache.spark.sql.Dataset.org
>> >> >
>> >> > $apache$spark$sql$Dataset$$collect(Dataset.scala:2384)
>> >> >   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
>> >> > scala:2120)
>> >> >   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.
>> >> > scala:2119)
>> >> >   at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala
>> :2810)
>> >> >   at org.apache.spark.sql.Dataset.head(Dataset.scala:2119)
>> >> >   at org.apache.spark.sql.Dataset.take(Dataset.scala:2334)
>> >> >   at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
>> >> >   at org.apache.spark.sql.Dataset.show(Dataset.scala:640)
>> >> >   ... 50 elided
>> >> >
>> >> >
>> >> >
>> >> > yixu2001
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Thanks
>> >> Sounak
>> >>
>> >
>> >
>> >
>> > --
>> > Thanks
>> > Sounak
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5
>> .nabble.com/
>>
>
>
>
> --
> Thanks
> Sounak
>



-- 
Thanks
Sounak

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

Reply via email to