Useless Delete operations in global index update

William Mon, 01 Aug 2016 02:14:24 -0700

Hi all,
   I found an interesting problem in global secondary index update. I am using 
Phoenix 4.6.1. See the following SQLs:



   create table tt (pk1 integer not null, pk2 varchar not null, cf.a integer, 
cf.b varchar constraint pk primary key (pk1, pk2));
   create index tta on tt(cf.a);
   create index ttb on tt(cf.b);


   Now we have a data table TT and its index tables TTA and TTB. Do some UPSERT 
into TT.
   
   upsert into tt values (1, '#1', 100, '111');


   For this UPSERT statement, I expect one Scan and one Put operation executed 
on the data table TT, and one Put for each index table. But the operations 
actually took place were different.
   For TT, two Scans were executed, one for CF.A and another for CF.B. I think 
this should be reduced to one. 
   And for each index table, a useless Delete was executed.  Because when doing 
the UPSERT, the data table was empty. So there was nothing to delete in the 
index tables. 
   I added some log in the code and dumped the Delete objects. 


  
{"ts":1470037773335,"totalColumns":0,"families":{},"row":"\\x00\\x80\\x00\\x00\\x01#1"}


 '\\x80\\x00\\x00\\x01' is PK1 and '#1' is PK2, then '\\x00' in the beginning 
must be CF.A. As we cannot get a value for CF.A, then just use 0 instead? 
  Why are we doing this? Why not ignoring the Delete if we cannot get the old 
state?


 A direct result of this behavior is that the first region of the index table 
handle more than 10 times requests, as it handles all the delete requests 
itself if the scenario is  'append more and update less'.
This wastes a lot of RPC calls and extends the RT of the data table UPSERT 
operation.
Might anyone explain this for me? Thanks.


William.

Useless Delete operations in global index update

Reply via email to