Afshin Moazami created PHOENIX-2521:
---------------------------------------

             Summary: Index rows are not updated when the index key updated 
using bulk loader 
                 Key: PHOENIX-2521
                 URL: https://issues.apache.org/jira/browse/PHOENIX-2521
             Project: Phoenix
          Issue Type: Bug
    Affects Versions: 4.5.2
            Reporter: Afshin Moazami


 found out the map reduce csv bulk load tool doesn't behave the same as 
UPSERTs. Is it by design or a bug?

Here is the queries for creating table and index:

{code} CREATE TABLE mySchema.mainTable (
id varchar NOT NULL,
name varchar,
address varchar
CONSTRAINT pk PRIMARY KEY (id)); {code}


{code} CREATE INDEX myIndex 
ON mySchema.mainTable  (name, id) 
INCLUDE (address); {code}

if I execute two upserts where the second one update the name (which is the key 
for index), everything works fine (the record will be updated in both table and 
index table)

{code} UPSERT INTO mySchema.mainTable (id, name, address) values ('1', 'john', 
'Montreal');{code}
{code}UPSERT INTO mySchema.mainTable (id, name, address) values ('1', 'jack', 
'Montreal');{code}

{code}SELECT /*+ INDEX(mySchema.mainTable myIndex) */ * from mySchema.mainTable 
where name = 'jack'; {code}  ==> one record
{code}SELECT /*+ INDEX(mySchema.mainTable myIndex) */ * from mySchema.mainTable 
where name = 'john';  {code}  ==> zero records

But, if I load the date using org.apache.phoenix.mapreduce.CsvBulkLoadTool to 
the main table, it behaves different. The main table will be updated, but the 
new record will be appended to the index table:

HADOOP_CLASSPATH=/usr/lib/hbase/lib/hbase-protocol-1.1.2.jar:/etc/hbase/conf 
hadoop jar  
/usr/lib/hbase/phoenix-4.5.2-HBase-1.1-bin/phoenix-4.5.2-HBase-1.1-client.jar 
org.apache.phoenix.mapreduce.CsvBulkLoadTool -d',' -s mySchema -t mainTable -i 
/tmp/input.txt 

input.txt:
2,tomas,montreal
2,george,montreal

(I have tried it both with/without -it and got the same result)

{code}SELECT /*+ INDEX(mySchema.mainTable myIndex) */ * from mySchema.mainTable 
where name = 'tomas' {code} ==> one record;

{code} SELECT /*+ INDEX(mySchema.mainTable myIndex) */ * from 
mySchema.mainTable where name = 'george' {code} ==> one record;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to