[ https://issues.apache.org/jira/browse/PHOENIX-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343562#comment-14343562 ]
James Taylor commented on PHOENIX-1609: --------------------------------------- Thanks for the excellent work, [~maghamravikiran]. I think we're pretty close. Here's some additional feedback: - It's important that you "freeze" the timestamp for the SELECT query you're running, as incremental updates to the data table will start being maintained as soon as the index is created. The initial population is *only* to create the rows for the index based on the current data table state (as indexTable.getTimestamp() + 1). If you vary the timestamp, then you'd potentially be overriding incremental index updates that come in right after the index is created. The way to freeze the timestamp is by setting the CURRENT_SCN connection property on the connections for the SELECT query to the next timestamp after the creation of the index (indexTable.getTimestamp() + 1). - The process for initial index population is different for local indexes (see MetaDataClient.buildIndex()). Probably best to tackle this in a separate JIRA with the help of [~rajeshbabu]. With local indexes, we're able to populate the index completely on the server side since the table and index data are co-resident on the same region server. The way we do this is by setting a "special" scan attribute (LOCAL_INDEX_BUILD) and then running a COUNT(*) query which will execute against every region of the data table. Then, on the server-side, when this attribute is set, we write the local index updates in our coprocessor. I know that aggregate queries aren't supported by the MR framework, but perhaps we can run the regular scan (SELECT *), but set the UNGROUPED_AGG attribute on the scan to trigger the same code path? > MR job to populate index tables > -------------------------------- > > Key: PHOENIX-1609 > URL: https://issues.apache.org/jira/browse/PHOENIX-1609 > Project: Phoenix > Issue Type: New Feature > Reporter: maghamravikiran > Assignee: maghamravikiran > Attachments: 0001-PHOENIX-1609-4.0.patch, > 0001-PHOENIX-1609-wip.patch, 0001-PHOENIX_1609.patch > > > Often, we need to create new indexes on master tables way after the data > exists on the master tables. It would be good to have a simple MR job given > by the phoenix code that users can call to have indexes in sync with the > master table. > Users can invoke the MR job using the following command > hadoop jar org.apache.phoenix.mapreduce.Index -st MASTER_TABLE -tt > INDEX_TABLE -columns a,b,c > Is this ideal? -- This message was sent by Atlassian JIRA (v6.3.4#6332)