[
https://issues.apache.org/jira/browse/PHOENIX-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15487805#comment-15487805
]
James Taylor commented on PHOENIX-3271:
---------------------------------------
FYI, [~lhofhansl], [~singamteja] - something to think about.
> Distribute UPSERT SELECT across cluster
> ---------------------------------------
>
> Key: PHOENIX-3271
> URL: https://issues.apache.org/jira/browse/PHOENIX-3271
> Project: Phoenix
> Issue Type: Improvement
> Reporter: James Taylor
>
> Based on some informal testing we've done, it seems that creation of a local
> index is orders of magnitude faster that creation of global indexes (17
> seconds versus 10-20 minutes - though more data is written in the global
> index case). Under the covers, a global index is created through the running
> of an UPSERT SELECT. Also, UPSERT SELECT provides an easy way of copying a
> table. In both of these cases, the data being upserted must all flow back to
> the same client which can become a bottleneck for a large table. Instead,
> what can be done is to push each separate, chunked UPSERT SELECT call out to
> a different region server for execution there. One way we could implement
> this would be to have an endpoint coprocessor push the chunked UPSERT SELECT
> out to each region server and return the number of rows that were upserted
> back to the client.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)