[
https://issues.apache.org/jira/browse/PHOENIX-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343562#comment-14343562
]
James Taylor commented on PHOENIX-1609:
---------------------------------------
Thanks for the excellent work, [~maghamravikiran]. I think we're pretty close.
Here's some additional feedback:
- It's important that you "freeze" the timestamp for the SELECT query you're
running, as incremental updates to the data table will start being maintained
as soon as the index is created. The initial population is *only* to create the
rows for the index based on the current data table state (as
indexTable.getTimestamp() + 1). If you vary the timestamp, then you'd
potentially be overriding incremental index updates that come in right after
the index is created. The way to freeze the timestamp is by setting the
CURRENT_SCN connection property on the connections for the SELECT query to the
next timestamp after the creation of the index (indexTable.getTimestamp() + 1).
- The process for initial index population is different for local indexes (see
MetaDataClient.buildIndex()). Probably best to tackle this in a separate JIRA
with the help of [~rajeshbabu]. With local indexes, we're able to populate the
index completely on the server side since the table and index data are
co-resident on the same region server. The way we do this is by setting a
"special" scan attribute (LOCAL_INDEX_BUILD) and then running a COUNT(*) query
which will execute against every region of the data table. Then, on the
server-side, when this attribute is set, we write the local index updates in
our coprocessor. I know that aggregate queries aren't supported by the MR
framework, but perhaps we can run the regular scan (SELECT *), but set the
UNGROUPED_AGG attribute on the scan to trigger the same code path?
> MR job to populate index tables
> --------------------------------
>
> Key: PHOENIX-1609
> URL: https://issues.apache.org/jira/browse/PHOENIX-1609
> Project: Phoenix
> Issue Type: New Feature
> Reporter: maghamravikiran
> Assignee: maghamravikiran
> Attachments: 0001-PHOENIX-1609-4.0.patch,
> 0001-PHOENIX-1609-wip.patch, 0001-PHOENIX_1609.patch
>
>
> Often, we need to create new indexes on master tables way after the data
> exists on the master tables. It would be good to have a simple MR job given
> by the phoenix code that users can call to have indexes in sync with the
> master table.
> Users can invoke the MR job using the following command
> hadoop jar org.apache.phoenix.mapreduce.Index -st MASTER_TABLE -tt
> INDEX_TABLE -columns a,b,c
> Is this ideal?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)