[jira] [Comment Edited] (PHOENIX-898) Extend PhoenixHBaseStorage to specify upsert columns

James Violette (JIRA) Mon, 31 Mar 2014 12:09:12 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955488#comment-13955488
 ]


James Violette edited comment on PHOENIX-898 at 3/31/14 7:07 PM:
-----------------------------------------------------------------

Thanks for the comments.  I plan on getting most of them resolved today. I can 
look at setting up the parsing unit tests, but am not sure how to get the 
integration tests done.

The column family is a core HBase concept, which is central to distributing 
row-based workloads across servers. In addition, column names are unique within 
a column family, but that uniqueness is not shared across column families.  
Therefore, it is important for our HBase users to access column families 
through the Phoenix API.  

We can use the same column name resolution approach as in PHOENIX-902, where we 
allow for both bare columns and family-qualified columns.  At the moment, I may 
not have time to consolidate the logic into a shared class, but I do have time 
to implement it here.  We could consolidate the column name resolution logic 
into a shared location.



was (Author: jviolettedsiq):
Thanks for the comments.  I plan on getting most of them resolved today. I can 
look at setting up the parsing unit tests, but am not sure how to get the 
integration tests done.

The column family is a core HBase concept, which is central to distributing 
row-based workloads across servers. In addition, column names are unique within 
a column family, but that uniqueness is not shared across column families.  
Therefore, it is important for our HBsae users to access column families 
through the Phoenix API.  

We can use the same column name resolution approach as in PHOENIX-902, where we 
allow for both bare columns and family-qualified columns.  At the moment, I may 
not have time to consolidate the logic into a shared class, but I do have time 
to implement it here.  We could consolidate the column name resolution logic 
into a shared location.


> Extend PhoenixHBaseStorage to specify upsert columns
> ----------------------------------------------------
>
>                 Key: PHOENIX-898
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-898
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: James Violette
>             Fix For: 3.0.0
>
>         Attachments: PHOENIX_898.patch
>
>
> We have a Phoenix table with data from multiple sources.  We would like to 
> write a pig script that upserts only data associated with a feed, leaving 
> other data alone.  The current PhoenixHBaseStorage automatically upserts all 
> columns in a table.
> Given this table schema as an example, 
> create TABLE IF NOT EXISTS MYSCHEMA.MYTABLE
>  (NAME varchar not null
>   ,D.INFO VARCHAR
>   ,D.D1 DOUBLE
>   ,D.I1 INTEGER
>   ,D.C1 VARCHAR
>  CONSTRAINT pk PRIMARY KEY (NAME));   
> Assuming 'A' is loaded into pig,
> The current syntax loads all columns into MYSCHEMA.MYTABLE:
> STORE A into 'hbase://MYSCHEMA.MYTABLE' using
>     org.apache.phoenix.pig.PhoenixHBaseStorage('localhost','-batchSize 5000');
> We could specify upsert columns after the table in the hbase:// url.  
> This column-based example is equivalent to the full table upsert.
> STORE A into 'hbase://MYSCHEMA.MYTABLE/NAME,D.INFO,D.D1,D.I1,D.C1' using
>     org.apache.phoenix.pig.PhoenixHBaseStorage('localhost','-batchSize 5000');
> This column-based example chooses to load only three of the five columns.
> STORE A into 'hbase://MYSCHEMA.MYTABLE/NAME,D.INFO,D.I1' using
>     org.apache.phoenix.pig.PhoenixHBaseStorage('localhost','-batchSize 5000');
> This change would touch 
> PhoenixHBaseStorage.setStoreLocation - parse the columns
> PhoenixPigConfiguration.configure - add an optional column list parameter.
> PhoenixPigConfiguration.setup - create the upsert statement and create the 
> column metadata list
> The rest of the code should work as-is.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PHOENIX-898) Extend PhoenixHBaseStorage to specify upsert columns

Reply via email to