Lars Hofhansl created PHOENIX-5688:
--------------------------------------

             Summary: Investigate better server work pacing
                 Key: PHOENIX-5688
                 URL: https://issues.apache.org/jira/browse/PHOENIX-5688
             Project: Phoenix
          Issue Type: Bug
            Reporter: Lars Hofhansl


[~kozdemir] shared an intriguing idea that he used for the server side index 
repair tool, which would equally well apply to the server side deletes and 
server side UPSERT/SELECT.

The main problem with the current implementation is that we basically send a 
predicate to the server - DELETE FROM <table> WHERE <condition>. Now the 
server(s) will go away per region chunk, evaluate the condition and delete 
whatever matched it... All in tight server loop.

The downside is that (a) a server thread is held up arbitrarily long, (b) there 
is no way for the server to do any fair queuing, the loop has to finish, and 
(c) if the server takes too long the client will just time out.

The alternative used to be to do the work on the client instead: Issue a scan 
with the condition to the server, retrieve the IDs to the client, and then 
issue nice chunks of deletes back to the server.

The downside here is the extra communication overhead between the server and 
client (which might be especially taxing for UPSERT/SELECTS).

Kadir's approach is a middle ground:
 # Issue a scan from the client, and send along a chunk size (N rows), when 
getting the scanner.
 # The server will do N rows worth of work, then return.
 # The client keeps the scanner open, and calls next.
 # Goto #2

This way we get the benefit of both approaches: (1) work close to where the 
data is, (2) the client can pace the work and the server gets a chance to 
schedule other work.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to