James Violette created PHOENIX-898: -------------------------------------- Summary: Extend PhoenixHBaseStorage to specify upsert columns Key: PHOENIX-898 URL: https://issues.apache.org/jira/browse/PHOENIX-898 Project: Phoenix Issue Type: Improvement Affects Versions: 3.0.0 Reporter: James Violette Fix For: 3.0.0
We have a Phoenix table with data from multiple sources. We would like to write a pig script that upserts only data associated with a feed, leaving other data alone. The current PhoenixHBaseStorage automatically upserts all columns in a table. Given this table schema as an example, create TABLE IF NOT EXISTS MYSCHEMA.MYTABLE (NAME varchar not null ,D.INFO VARCHAR ,D.D1 DOUBLE ,D.I1 INTEGER ,D.C1 VARCHAR CONSTRAINT pk PRIMARY KEY (NAME)); Assuming 'A' is loaded into pig, The current syntax loads all columns into MYSCHEMA.MYTABLE: STORE A into 'hbase://MYSCHEMA.MYTABLE' using org.apache.phoenix.pig.PhoenixHBaseStorage('localhost','-batchSize 5000'); We could specify upsert columns after the table in the hbase:// url. This column-based example is equivalent to the full table upsert. STORE A into 'hbase://MYSCHEMA.MYTABLE/NAME,D.INFO,D.D1,D.I1,D.C1' using org.apache.phoenix.pig.PhoenixHBaseStorage('localhost','-batchSize 5000'); This column-based example chooses to load only three of the five columns. STORE A into 'hbase://MYSCHEMA.MYTABLE/NAME,D.INFO,D.I1' using org.apache.phoenix.pig.PhoenixHBaseStorage('localhost','-batchSize 5000'); This change would touch PhoenixHBaseStorage.setStoreLocation - parse the columns PhoenixPigConfiguration.configure - add an optional column list parameter. PhoenixPigConfiguration.setup - create the upsert statement and create the column metadata list The rest of the code should work as-is. -- This message was sent by Atlassian JIRA (v6.2#6252)