[
https://issues.apache.org/jira/browse/PHOENIX-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699958#comment-14699958
]
Gabriel Reid commented on PHOENIX-2154:
---------------------------------------
One thing to keep in mind about the trace-off between writing HFiles vs writing
directly to HBase is the flush and compaction overhead that can be come from
such high-throughput writing. In the case of writing index entries there might
not be too big of a problem with this, but I've definitely seen large flush and
compaction queues get generated due to writing lots of heavy-weight rows via a
MR job (with the resolution to this issue always being writing HFiles).
> Failure of one mapper should not affect other mappers in MR index build
> -----------------------------------------------------------------------
>
> Key: PHOENIX-2154
> URL: https://issues.apache.org/jira/browse/PHOENIX-2154
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
> Attachments: IndexTool.java
>
>
> Once a mapper in the MR index job succeeds, it should not need to be re-done
> in the event of the failure of one of the other mappers. The initial
> population of an index is based on a snapshot in time, so new rows getting
> *after* the index build has started and/or failed do not impact it.
> Also, there's a 1:1 correspondence between index rows and table rows, so
> there's really no need to dedup. However, the index rows will have a
> different row key than the data table, so I'm not sure how the HFiles are
> split. Will they potentially overlap and is this an issue?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)