[ https://issues.apache.org/jira/browse/PHOENIX-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955559#comment-13955559 ]
Prashant Kommireddi commented on PHOENIX-898: --------------------------------------------- Thanks [~jamestaylor], that's good info. [~jviolettedsiq] - Fyi, you could use {code}org.apache.pig.builtin.mock.Storage{code} to help with your tests (comparing data in/out). Take a look at https://github.com/Parquet/parquet-mr/blob/master/parquet-pig/src/test/java/parquet/pig/TestParquetStorer.java for some example usage. > Extend PhoenixHBaseStorage to specify upsert columns > ---------------------------------------------------- > > Key: PHOENIX-898 > URL: https://issues.apache.org/jira/browse/PHOENIX-898 > Project: Phoenix > Issue Type: Improvement > Affects Versions: 3.0.0 > Reporter: James Violette > Fix For: 3.0.0 > > Attachments: PHOENIX_898.patch > > > We have a Phoenix table with data from multiple sources. We would like to > write a pig script that upserts only data associated with a feed, leaving > other data alone. The current PhoenixHBaseStorage automatically upserts all > columns in a table. > Given this table schema as an example, > create TABLE IF NOT EXISTS MYSCHEMA.MYTABLE > (NAME varchar not null > ,D.INFO VARCHAR > ,D.D1 DOUBLE > ,D.I1 INTEGER > ,D.C1 VARCHAR > CONSTRAINT pk PRIMARY KEY (NAME)); > Assuming 'A' is loaded into pig, > The current syntax loads all columns into MYSCHEMA.MYTABLE: > STORE A into 'hbase://MYSCHEMA.MYTABLE' using > org.apache.phoenix.pig.PhoenixHBaseStorage('localhost','-batchSize 5000'); > We could specify upsert columns after the table in the hbase:// url. > This column-based example is equivalent to the full table upsert. > STORE A into 'hbase://MYSCHEMA.MYTABLE/NAME,D.INFO,D.D1,D.I1,D.C1' using > org.apache.phoenix.pig.PhoenixHBaseStorage('localhost','-batchSize 5000'); > This column-based example chooses to load only three of the five columns. > STORE A into 'hbase://MYSCHEMA.MYTABLE/NAME,D.INFO,D.I1' using > org.apache.phoenix.pig.PhoenixHBaseStorage('localhost','-batchSize 5000'); > This change would touch > PhoenixHBaseStorage.setStoreLocation - parse the columns > PhoenixPigConfiguration.configure - add an optional column list parameter. > PhoenixPigConfiguration.setup - create the upsert statement and create the > column metadata list > The rest of the code should work as-is. -- This message was sent by Atlassian JIRA (v6.2#6252)