[I] Lower time to host ondemand tablets [accumulo]

via GitHub Fri, 17 May 2024 09:46:25 -0700


keith-turner opened a new issue, #4571:
URL: https://github.com/apache/accumulo/issues/4571


   When an on demand tablet is requested to be hosted. The following happens in 
the manager
   
    1. Set the `requestToHostColumn`
    2. Trigger an event that cause the TGW to scan the range
    3. TGW then :
       1.  scans metadata table and find tablet w/  `requestToHostColumn`
       2. consults balancer to get assignment location
       3. sets a future location
       4. sends assignment RPC to tablet server
   
   When running SplitMillionIT see the following events happen
   
     1. A table w/ million tablets is cloned.  This triggers an event that 
causes the TGW to scan the 1 million tablets.
     2. SplitMillionIT tries to scan 100 of the 1M tablets.  This causes a 
request to the manager to host the 100 tablets.
     3. The above request gets backed up behind the TGW scanning the 1 million 
new tablets from the clone and that makes scanning 100 tablets take a while.
   
   
   One possible way this could be improved is to the following when the manager 
gets a request to host tablets.
   
    1. consults balancer to get assignment locations
    2. sets a future location
    3. sends assignment RPC to tablet server
   
   So, do what the TGW is doing directly and instead of setting 
`requestToHostColumn` set the `future` column.  May be able to drop the 
`requestToHostColumn`.  The `future` column being set has very similar 
properties to the `requestToHostColumn`.  When the TGW sees a future column, it 
will send an RPC to tablet server to load the tablet.  So if the manager is 
working on request to host a tablet and it sets the future location and then 
dies, when the manager starts again the TGW will see the future column and send 
an RPC.
   
   One pain point in implementing the above strategy is that synchronization 
may be needed around the balancer plugin.  Not sure its well defined about the 
expectations of a balancer plugin to handle concurrent calls to it.  Some 
balancer plugins are stateful but do not have handling for concurrency. This 
may point to a large need to revisit the design of the balancer plugin in 
elasticity and get expectations about its use well documented.   A first cut of 
this change could just synchronize access to the balancer plugin.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Lower time to host ondemand tablets [accumulo]

Reply via email to