keith-turner opened a new issue, #4571:
URL: https://github.com/apache/accumulo/issues/4571
When an on demand tablet is requested to be hosted. The following happens in
the manager
1. Set the `requestToHostColumn`
2. Trigger an event that cause the TGW to scan the range
3. TGW then :
1. scans metadata table and find tablet w/ `requestToHostColumn`
2. consults balancer to get assignment location
3. sets a future location
4. sends assignment RPC to tablet server
When running SplitMillionIT see the following events happen
1. A table w/ million tablets is cloned. This triggers an event that
causes the TGW to scan the 1 million tablets.
2. SplitMillionIT tries to scan 100 of the 1M tablets. This causes a
request to the manager to host the 100 tablets.
3. The above request gets backed up behind the TGW scanning the 1 million
new tablets from the clone and that makes scanning 100 tablets take a while.
One possible way this could be improved is to the following when the manager
gets a request to host tablets.
1. consults balancer to get assignment locations
2. sets a future location
3. sends assignment RPC to tablet server
So, do what the TGW is doing directly and instead of setting
`requestToHostColumn` set the `future` column. May be able to drop the
`requestToHostColumn`. The `future` column being set has very similar
properties to the `requestToHostColumn`. When the TGW sees a future column, it
will send an RPC to tablet server to load the tablet. So if the manager is
working on request to host a tablet and it sets the future location and then
dies, when the manager starts again the TGW will see the future column and send
an RPC.
One pain point in implementing the above strategy is that synchronization
may be needed around the balancer plugin. Not sure its well defined about the
expectations of a balancer plugin to handle concurrent calls to it. Some
balancer plugins are stateful but do not have handling for concurrency. This
may point to a large need to revisit the design of the balancer plugin in
elasticity and get expectations about its use well documented. A first cut of
this change could just synchronize access to the balancer plugin.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]