James Taylor created PHOENIX-3271:
-------------------------------------

             Summary: Distribute UPSERT SELECT across cluster
                 Key: PHOENIX-3271
                 URL: https://issues.apache.org/jira/browse/PHOENIX-3271
             Project: Phoenix
          Issue Type: Improvement
            Reporter: James Taylor


Based on some informal testing we've done, it seems that creation of a local 
index is orders of magnitude faster that creation of global indexes (17 seconds 
versus 10-20 minutes - though more data is written in the global index case). 
Under the covers, a global index is created through the running of an UPSERT 
SELECT. Also, UPSERT SELECT provides an easy way of copying a table. In both of 
these cases, the data being upserted must all flow back to the same client 
which can become a bottleneck for a large table. Instead, what can be done is 
to push each separate, chunked UPSERT SELECT call out to a different region 
server for execution there. One way we could implement this would be to have an 
endpoint coprocessor push the chunked UPSERT SELECT out to each region server 
and return the number of rows that were upserted back to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to