[jira] [Commented] (PHOENIX-3271) Distribute UPSERT SELECT across cluster

Ankit Singhal (JIRA) Fri, 27 Jan 2017 07:26:42 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842997#comment-15842997
 ]


Ankit Singhal commented on PHOENIX-3271:
----------------------------------------

Thanks [~enis] for looking into this.
bq. Rajeshbabu Chintaguntla has a patch which changes the RPC scheduler to be 
configured programmatically from the server side, related to PHOENIX-3360. Do 
we need that patch in before this?
I think yes because now it is easy to get in deadlock if our RPC scheduler is 
not present. So, loading it automatically on region server(using 
[[email protected]] short term fix as per [~devaraj] 
[comment|https://issues.apache.org/jira/browse/PHOENIX-3360?focusedCommentId=15553907&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15553907]
 )  instead of depending on the user to explicitly set the configuration would 
be better. [[email protected]], can you push the same as part of new 
JIRA if PHOENIX-3360 needs to be kept open for long term fix.

bq. One other thing is that these scan RPCs will take a longer time and will 
timeout, and retried from the client, causing the worst case behaviour to be 
pretty bad user experience. Do we have any plans for dealing with that?
As [~giacomotaylor] said, we should not expect the RPC timeout after 
[[email protected]] implementation(PHOENIX-2357).

can I push this patch now if no further comments are there. 
[~giacomotaylor]/[~enis]?

Post this, I'll take up the long-term fixes in Scheduler suggested by [~enis] 
as a part of another JIRA.







> Distribute UPSERT SELECT across cluster
> ---------------------------------------
>
>                 Key: PHOENIX-3271
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3271
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Ankit Singhal
>             Fix For: 4.10.0
>
>         Attachments: PHOENIX-3271.patch, PHOENIX-3271_v1.patch, 
> PHOENIX-3271_v2.patch, PHOENIX-3271_v3.patch, PHOENIX-3271_v4.patch, 
> PHOENIX-3271_v5.patch
>
>
> Based on some informal testing we've done, it seems that creation of a local 
> index is orders of magnitude faster that creation of global indexes (17 
> seconds versus 10-20 minutes - though more data is written in the global 
> index case). Under the covers, a global index is created through the running 
> of an UPSERT SELECT. Also, UPSERT SELECT provides an easy way of copying a 
> table. In both of these cases, the data being upserted must all flow back to 
> the same client which can become a bottleneck for a large table. Instead, 
> what can be done is to push each separate, chunked UPSERT SELECT call out to 
> a different region server for execution there. One way we could implement 
> this would be to have an endpoint coprocessor push the chunked UPSERT SELECT 
> out to each region server and return the number of rows that were upserted 
> back to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-3271) Distribute UPSERT SELECT across cluster

Reply via email to