[ 
https://issues.apache.org/jira/browse/PHOENIX-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva resolved PHOENIX-3645.
-------------------------------------
    Resolution: Fixed

> Build a mechanism for creating a table and populating it with data from a 
> source table
> --------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3645
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3645
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Samarth Jain
>            Priority: Major
>
> As part of PHOENIX-1598, we are introducing the capability of mapping column 
> names and encoding column values. For users to be able to use this new 
> scheme, they would need to recreate their tables from the scratch. For 
> situations like this, it would be nice to have a mechanism where we can 
> create a new table and fill it with data of the existing table. 
> A simple possibility is to disable the source table, take a snapshot of it, 
> create new table using the snapshot of the old table, and drop the old table. 
> However, this would require downtime. 
> Another way would be use an UPSERT INTO TARGET TABLE SELECT * FROM SOURCE 
> TABLE or a map reduce job to the bulk load. These mechanisms though have the 
> inherent limitation that they miss the updates to the old table after they 
> were kicked off or after they were complete. To handle the case of these 
> missing updates, a somewhat crazy idea would be mark the new table as an 
> index on the existing table. The index table would have the same exact schema 
> as the data table. Incremental changes would then be automatically taken care 
> of by our index change mechanism. We can then use our existing map reduce 
> index build job to bulk load the "old" data into the new table.
> There is a slight chance that we would miss the update happening to the 
> source table when we are in the process of doing the index->table conversion.
> One way to handle that would be store the physical hbase table name for a 
> phoenix table in the SYSTEM.CATALOG. Then the reducer of the map reduce job 
> would simply have to change this mapping in the SYSTEM.CATALOG table. This 
> should cause the new updates to go to the new hbase table. 
> There are probably some edge cases or gotchas that I am not thinking about 
> right now. [~jamestaylor], probably has more thoughts on this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to