[ 
https://issues.apache.org/jira/browse/PHOENIX-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14714469#comment-14714469
 ] 

James Taylor commented on PHOENIX-2154:
---------------------------------------

In theory, the MR index build for local indexes should be very quick. We know 
the split points of the index table based on the data table and we should be 
able to build one HFile per data table region in our Mapper and just hand them 
off to HBase. [~rajeshbabu] - are we taking advantage of knowing the split 
points? No reduce phase should be necessary, so we can take the same approach 
as we're looking at now for front-door-HBase-APIs MR build: make it 
asynchronous and set the index state to active in the reduce phase. However, it 
seems like there'd be corner cases in which the data table may split while the 
index is being built - it's unclear to me how this scenario would be handled.

Also, it seems there are problems with IndexTool for local indexes, as there 
are scenarios where the MR completes yet a scan over the Phoenix tables says 
there are 0 rows. Will follow up on this in a separate JIRA, [~rajeshbabu]. 
Note that the code path is different for the MR index build than the standard 
build mechanism in Phoenix. I think we need to increase our testing in the area.

> Failure of one mapper should not affect other mappers in MR index build
> -----------------------------------------------------------------------
>
>                 Key: PHOENIX-2154
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2154
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: maghamravikiran
>         Attachments: IndexTool.java, PHOENIX-2154-WIP.patch, 
> PHOENIX-2154-_HBase_Frontdoor_API_WIP.patch
>
>
> Once a mapper in the MR index job succeeds, it should not need to be re-done 
> in the event of the failure of one of the other mappers. The initial 
> population of an index is based on a snapshot in time, so new rows getting 
> *after* the index build has started and/or failed do not impact it.
> Also, there's a 1:1 correspondence between index rows and table rows, so 
> there's really no need to dedup. However, the index rows will have a 
> different row key than the data table, so I'm not sure how the HFiles are 
> split. Will they potentially overlap and is this an issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to