Lars Hofhansl created PHOENIX-5688:
--------------------------------------
Summary: Investigate better server work pacing
Key: PHOENIX-5688
URL: https://issues.apache.org/jira/browse/PHOENIX-5688
Project: Phoenix
Issue Type: Bug
Reporter: Lars Hofhansl
[~kozdemir] shared an intriguing idea that he used for the server side index
repair tool, which would equally well apply to the server side deletes and
server side UPSERT/SELECT.
The main problem with the current implementation is that we basically send a
predicate to the server - DELETE FROM <table> WHERE <condition>. Now the
server(s) will go away per region chunk, evaluate the condition and delete
whatever matched it... All in tight server loop.
The downside is that (a) a server thread is held up arbitrarily long, (b) there
is no way for the server to do any fair queuing, the loop has to finish, and
(c) if the server takes too long the client will just time out.
The alternative used to be to do the work on the client instead: Issue a scan
with the condition to the server, retrieve the IDs to the client, and then
issue nice chunks of deletes back to the server.
The downside here is the extra communication overhead between the server and
client (which might be especially taxing for UPSERT/SELECTS).
Kadir's approach is a middle ground:
# Issue a scan from the client, and send along a chunk size (N rows), when
getting the scanner.
# The server will do N rows worth of work, then return.
# The client keeps the scanner open, and calls next.
# Goto #2
This way we get the benefit of both approaches: (1) work close to where the
data is, (2) the client can pace the work and the server gets a chance to
schedule other work.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)