ddanielr commented on issue #3211: URL: https://github.com/apache/accumulo/issues/3211#issuecomment-1458672277
> > Is there a mechanism to alert on approaching resource limits or evaluate the cluster's current size/use before marking a tablet as onDemand? > > Given that we don't know which schedulers might be in use on a cluster, I think we just need to emit metrics that can be used for a scheduling system to make a determination that more tablet servers are needed. Accumulo doesn't do any alerting, that's typically set up by the users of the system - trigger an alert when some criteria is met. :+1: Using an external metric collection system makes sense. > > Given the possibility of a limited resource footprint, what mechanism is going to be used for scheduling the tablet hosting? > I don't think we should build or use a specific scheduler. [KEDA](https://keda.sh/) and [HPA](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) exist already and I'm sure the different commercial cloud vendors may supply their own solutions as well (Azure has [VPA](https://learn.microsoft.com/en-us/azure/aks/vertical-pod-autoscaler) for example). Scheduling was probably the wrong word to use. But I agree on using a pre-defined scheduler for k8s resources. I think this question is wrapped into the priority discussion. >> What determines hosting priority of onDemand tables? should this exist? should it be a per-client setting or perhaps per table? > I don't think we have had any discussion of priority. Can you give an example of how priority would work? Please correct any inaccuracy here. Lets say we have an accumulo cluster that does not have tserver groups implemented and services two clients. Both clients have tables that, individually, fit within the cluster's resource footprint but when hosted at the same time exceed the current resources. If my understanding is correct, both clients can attempt writes on "onDemand" tables that are currently unloaded. Or, both clients could request that tables be moved from offline to "onDemand". Each of these actions would result in the tablets being marked for assignment and attempt to be loaded onto tservers via the TabletGroupWatcher. Since these are not enough resources for both tables to be fully hosted, will the TabletGroupWatcher discern which "ondemand" tablets should be hosted first? Or does it just see a pool of tablets that need to be assigned and will do so indiscriminately? If it's the latter, since the clients are attempting actions at a table level vs a tablet level. Does that mean all tablets for a given table need to be assigned and loaded before the action can be completed? If so, then as part of bringing tablets online, a priority level (1-x) or an "onDemand" request timestamp could be included in the metadata. Then the TabletGroupWatcher could ensure all tablets of a specific "onDemand" request would be fully hosted prior to assigning a second "OnDemand" requests tablets. Otherwise, wouldn't there be a resource blocking issue where tablets of both tables are being assigned, but each table cannot fully be hosted due to resource constraints? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
