[
https://issues.apache.org/jira/browse/PHOENIX-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236907#comment-16236907
]
Geoffrey Jacoby commented on PHOENIX-4344:
------------------------------------------
I don't see how Option 1 is problematic for indexes on non-PK columns, because
it's internally using the Phoenix JDBC API and so going through all the same
index-handling logic that a point-delete query issued from outside MapReduce
would be doing.
Let's say that I have a table ENTITY_HISTORY with a compound primary key (Key1,
Key2).
I create my MapReduce job with a query like "DELETE FROM ENTITY_HISTORY WHERE
Key1 > 'aaa'"
That delete would be converted to a select, and the MapReduce job would iterate
row by row over the result set. For each row, a new Delete query would be built
using that row's PK, e.g "DELETE FROM ENTITY_HISTORY WHERE Key1 = 'foo' and
Key2 = 'bar'" and executed using a PhoenixConnection (probably with some kind
of commit batching).
I'm somewhat concerned about the perf, but the correctness seems sound to me --
am I missing an issue?
> MapReduce Delete Support
> ------------------------
>
> Key: PHOENIX-4344
> URL: https://issues.apache.org/jira/browse/PHOENIX-4344
> Project: Phoenix
> Issue Type: New Feature
> Affects Versions: 4.12.0
> Reporter: Geoffrey Jacoby
> Assignee: Geoffrey Jacoby
> Priority: Major
>
> Phoenix already has the ability to use MapReduce for asynchronous handling of
> long-running SELECTs. It would be really useful to have this capability for
> long-running DELETEs, particularly of tables with indexes where using HBase's
> own MapReduce integration would be prohibitively complicated.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)