[jira] [Commented] (PHOENIX-1609) MR job to populate index tables

James Taylor (JIRA) Mon, 02 Mar 2015 10:54:30 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343562#comment-14343562
 ]


James Taylor commented on PHOENIX-1609:
---------------------------------------

Thanks for the excellent work, [~maghamravikiran]. I think we're pretty close. 
Here's some additional feedback:
- It's important that you "freeze" the timestamp for the SELECT query you're 
running, as incremental updates to the data table will start being maintained 
as soon as the index is created. The initial population is *only* to create the 
rows for the index based on the current data table state (as 
indexTable.getTimestamp() + 1). If you vary the timestamp, then you'd 
potentially be overriding incremental index updates that come in right after 
the index is created. The way to freeze the timestamp is by setting the 
CURRENT_SCN connection property on the connections for the SELECT query to the 
next timestamp after the creation of the index (indexTable.getTimestamp() + 1).
- The process for initial index population is different for local indexes (see 
MetaDataClient.buildIndex()). Probably best to tackle this in a separate JIRA 
with the help of [~rajeshbabu]. With local indexes, we're able to populate the 
index completely on the server side since the table and index data are 
co-resident on the same region server.  The way we do this is by setting a 
"special" scan attribute (LOCAL_INDEX_BUILD) and then running a COUNT(*) query 
which will execute against every region of the data table. Then, on the 
server-side, when this attribute is set, we write the local index updates in 
our coprocessor. I know that aggregate queries aren't supported by the MR 
framework, but perhaps we can run the regular scan (SELECT *), but set the 
UNGROUPED_AGG attribute on the scan to trigger the same code path?

> MR job to populate index tables 
> --------------------------------
>
>                 Key: PHOENIX-1609
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1609
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: maghamravikiran
>            Assignee: maghamravikiran
>         Attachments: 0001-PHOENIX-1609-4.0.patch, 
> 0001-PHOENIX-1609-wip.patch, 0001-PHOENIX_1609.patch
>
>
> Often, we need to create new indexes on master tables way after the data 
> exists on the master tables.  It would be good to have a simple MR job given 
> by the phoenix code that users can call to have indexes in sync with the 
> master table. 
> Users can invoke the MR job using the following command 
> hadoop jar org.apache.phoenix.mapreduce.Index -st MASTER_TABLE -tt 
> INDEX_TABLE -columns a,b,c
> Is this ideal? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1609) MR job to populate index tables

Reply via email to