[
https://issues.apache.org/jira/browse/PHOENIX-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662617#comment-16662617
]
Lars Hofhansl commented on PHOENIX-4344:
----------------------------------------
We just had a discussion around that. Can we do this?
# Create input split as we do now. No change there.
# In the map function, upon the _first row_ issue the equivalent of DELETE
FROM <table> WHERE <pk> >= split_start AND pk < split_end AND <whatever select
predicate was specified>
# finish the map task after the first row
Now Phoenix can push the DELETE down into the region and be an order of
magnitude or two faster compared to issuing point deletes.
A nice side effect is that if there's no data in a region we won't issue any
work at all.
I think that's what James was saying in the first comment.
[~gjacoby], [~jisaac]
> MapReduce Delete Support
> ------------------------
>
> Key: PHOENIX-4344
> URL: https://issues.apache.org/jira/browse/PHOENIX-4344
> Project: Phoenix
> Issue Type: New Feature
> Affects Versions: 4.12.0
> Reporter: Geoffrey Jacoby
> Assignee: Geoffrey Jacoby
> Priority: Major
>
> Phoenix already has the ability to use MapReduce for asynchronous handling of
> long-running SELECTs. It would be really useful to have this capability for
> long-running DELETEs, particularly of tables with indexes where using HBase's
> own MapReduce integration would be prohibitively complicated.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)