ddanielr opened a new issue, #3190:
URL: https://github.com/apache/accumulo/issues/3190

   **Is your feature request related to a problem? Please describe.**
   Right now the balancer creates resource pools on a per table-property level 
and uses the table name as the pool name. 
   
https://github.com/apache/accumulo/blob/main/core/src/main/java/org/apache/accumulo/core/spi/balancer/HostRegexTableLoadBalancer.java#L274-L285
   
   Per-table pool naming makes sense if you have a small amount of tables that 
are semi-persistent. However, anyone using a large number of tables that are 
short-lived would end up creating a large number of pools that have a high 
likelihood of overlapping similarity.
   
   In addition to the high volume of pools existing, this isn't very user 
friendly as a host regex needs to be set for each table. 
   
   Since any mistype or issue with the regex may result in undesired balancing 
state, setting this regex property per-table increases the amount of human 
value comparisons needed when troubleshooting a balancing issue. 
   
   Also, if a user is looking at pool properties, it is harder to deconstruct a 
regex to identify what servers are being matched vs reading a descriptive pool 
name. 
   
   **Describe the solution you'd like**
   Move the pool regex properties to a defined system property instead of a 
per-table property.
   Example property layout. 
   ```
   resource.pool.name
   resource.pool.name.regex
   resource.pool.name.using.ips
   resource.pool.name.max.migrations
   etc...
   ```
   Add a "resource-group" per-table property that corresponds to a defined 
resource pool name. 
   
   The users would define resource pool names that are descriptive 
(`high-performance`, `medium-performance`, `low-performance`, `rack-adjacent`, 
`top-of-rack`, etc) and set these regex values ONCE on pool initialization. 
   
   On table creation, the users only need to set a descriptive resource pool 
name from a list of currently defined pools. 
   This per-table property can be validated against the defined pool list and 
rejected if it doesn't match. 
   This reduces the chance of a single table being improperly balanced. 
   
   The balancer functionality would mostly remain the same with a slight change 
for pool initialization, retrieving pool names, and matching table prop to pool 
name.  
   
   There should also be a way of validating the regex against currently known 
servers from the CLI. 
   
   **Describe alternatives you've considered**
   I concede that you could solve this by setting the host patterns at the 
namespace level, however this means a user is restricted to a single balancing 
strategy per-namespace. 
   
   **Additional context**
   This is related to the server group conversation in #3178. However, that 
conversation seems to be scoped for 3.0.0 and beyond whereas this change isn't 
that drastic and could be applied to 2.x versions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to