[ https://issues.apache.org/jira/browse/PHOENIX-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954816#comment-13954816 ]
James Taylor commented on PHOENIX-898: -------------------------------------- Thanks for the contributions, [~james.viole...@ds-iq.com] and thanks for the review, [~prkommireddi]. +1 with the above changes - will commit as soon as the 3.0 and 4.0 releases are final. > Extend PhoenixHBaseStorage to specify upsert columns > ---------------------------------------------------- > > Key: PHOENIX-898 > URL: https://issues.apache.org/jira/browse/PHOENIX-898 > Project: Phoenix > Issue Type: Improvement > Affects Versions: 3.0.0 > Reporter: James Violette > Fix For: 3.0.0 > > Attachments: PHOENIX_898.patch > > > We have a Phoenix table with data from multiple sources. We would like to > write a pig script that upserts only data associated with a feed, leaving > other data alone. The current PhoenixHBaseStorage automatically upserts all > columns in a table. > Given this table schema as an example, > create TABLE IF NOT EXISTS MYSCHEMA.MYTABLE > (NAME varchar not null > ,D.INFO VARCHAR > ,D.D1 DOUBLE > ,D.I1 INTEGER > ,D.C1 VARCHAR > CONSTRAINT pk PRIMARY KEY (NAME)); > Assuming 'A' is loaded into pig, > The current syntax loads all columns into MYSCHEMA.MYTABLE: > STORE A into 'hbase://MYSCHEMA.MYTABLE' using > org.apache.phoenix.pig.PhoenixHBaseStorage('localhost','-batchSize 5000'); > We could specify upsert columns after the table in the hbase:// url. > This column-based example is equivalent to the full table upsert. > STORE A into 'hbase://MYSCHEMA.MYTABLE/NAME,D.INFO,D.D1,D.I1,D.C1' using > org.apache.phoenix.pig.PhoenixHBaseStorage('localhost','-batchSize 5000'); > This column-based example chooses to load only three of the five columns. > STORE A into 'hbase://MYSCHEMA.MYTABLE/NAME,D.INFO,D.I1' using > org.apache.phoenix.pig.PhoenixHBaseStorage('localhost','-batchSize 5000'); > This change would touch > PhoenixHBaseStorage.setStoreLocation - parse the columns > PhoenixPigConfiguration.configure - add an optional column list parameter. > PhoenixPigConfiguration.setup - create the upsert statement and create the > column metadata list > The rest of the code should work as-is. -- This message was sent by Atlassian JIRA (v6.2#6252)