On Thu, Jul 26, 2018 at 9:59 AM, Pranith Kumar Karampuri < [email protected]> wrote:
> > > On Wed, Jul 25, 2018 at 10:48 PM, John Strunk <[email protected]> wrote: > >> I have not put together a list. Perhaps the following will help w/ the >> context though... >> >> The "reconcile loop" of the operator will take the cluster CRs and >> reconcile them against the actual cluster config. At the 20k foot level, >> this amounts to something like determining there should be 8 gluster pods >> running, and making the appropriate changes if that doesn't match reality. >> In practical terms, the construction of this reconciliation loop can be >> thought of as a set (array) of 3-tuples: [{should_act() -> bool, can_act -> >> bool, action() -> ok, error}, {..., ..., ...}, ...] >> >> Each capability of the operator would be expressed as one of these tuples. >> should_act() : true if the action() should be taken >> can_act() : true if the prerequisites for taking the action are met >> action() : make the change. Only run if should && can. >> (note that I believe should_act() and can_act() should not be separate in >> the implementation, for reasons I'll not go into here) >> >> An example action might be "upgrade the container image for pod X". The >> associated should_act would be triggered if the "image=" of the pod doesn't >> match the desired "image=" in the operator CRs. The can_act evaluation >> would be verifying that it's ok to do this... Thinking from the top of my >> head: >> - All volumes w/ a brick on this pod should be fully healed >> - Sufficient cluster nodes should be up such that quorum is not lost when >> this node goes down (does this matter?) >> - The proposed image is compatible with the current version of the CSI >> driver(s), the operator, and other gluster pods >> - Probably some other stuff >> The action() would update the "image=" in the Deployment to trigger the >> rollout >> >> The idea is that queries would be made, both to the kube API and the >> gluster cluster to verify the necessary preconditions for an action prior >> to that action being invoked. There would obviously be commonality among >> the preconditions for various actions, so the results should be fetched >> exactly once per reconcile cycle. Also note, 1 cycle == at most 1 action() >> due to the action changing the state of the system. >> >> Given that we haven't designed (or even listed) all the potential >> action()s, I can't give you a list of everything to query. I guarantee >> we'll need to know the up/down status, heal counts, and free capacity for >> each brick and node. >> > > Thanks for the detailed explanation. This helps. One question though, is 5 > seconds a hard limit or is there a possibility to configure it? > I put together an idea for reducing the mgmt operation latency involving mounts at https://github.com/gluster/glusterd2/issues/1069, comments welcome. @john Still want to know if there exists a way to find if the hard limit can be configured... > > >> >> -John >> >> On Wed, Jul 25, 2018 at 11:56 AM Pranith Kumar Karampuri < >> [email protected]> wrote: >> >>> >>> >>> On Wed, Jul 25, 2018 at 8:17 PM, John Strunk <[email protected]> wrote: >>> >>>> To add an additional data point... The operator will need to regularly >>>> reconcile the true state of the gluster cluster with the desired state >>>> stored in kubernetes. This task will be required frequently (i.e., >>>> operator-framework defaults to every 5s even if there are no config >>>> changes). >>>> The actual amount of data we will need to query from the cluster is >>>> currently TBD and likely significantly affected by Heketi/GD1 vs. GD2 >>>> choice. >>>> >>> >>> Do we have any partial list of data we will gather? Just want to >>> understand what this might entail already... >>> >>> >>>> >>>> -John >>>> >>>> >>>> On Wed, Jul 25, 2018 at 5:45 AM Pranith Kumar Karampuri < >>>> [email protected]> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Jul 24, 2018 at 10:10 PM, Sankarshan Mukhopadhyay < >>>>> [email protected]> wrote: >>>>> >>>>>> On Tue, Jul 24, 2018 at 9:48 PM, Pranith Kumar Karampuri >>>>>> <[email protected]> wrote: >>>>>> > hi, >>>>>> > Quite a few commands to monitor gluster at the moment take >>>>>> almost a >>>>>> > second to give output. >>>>>> >>>>>> Is this at the (most) minimum recommended cluster size? >>>>>> >>>>> >>>>> Yes, with a single volume with 3 bricks i.e. 3 nodes in cluster. >>>>> >>>>> >>>>>> >>>>>> > Some categories of these commands: >>>>>> > 1) Any command that needs to do some sort of mount/glfs_init. >>>>>> > Examples: 1) heal info family of commands 2) statfs to find >>>>>> > space-availability etc (On my laptop replica 3 volume with all >>>>>> local bricks, >>>>>> > glfs_init takes 0.3 seconds on average) >>>>>> > 2) glusterd commands that need to wait for the previous command to >>>>>> unlock. >>>>>> > If the previous command is something related to lvm snapshot which >>>>>> takes >>>>>> > quite a few seconds, it would be even more time consuming. >>>>>> > >>>>>> > Nowadays container workloads have hundreds of volumes if not >>>>>> thousands. If >>>>>> > we want to serve any monitoring solution at this scale (I have seen >>>>>> > customers use upto 600 volumes at a time, it will only get bigger) >>>>>> and lets >>>>>> > say collecting metrics per volume takes 2 seconds per volume(Let us >>>>>> take the >>>>>> > worst example which has all major features enabled like >>>>>> > snapshot/geo-rep/quota etc etc), that will mean that it will take >>>>>> 20 minutes >>>>>> > to collect metrics of the cluster with 600 volumes. What are the >>>>>> ways in >>>>>> > which we can make this number more manageable? I was initially >>>>>> thinking may >>>>>> > be it is possible to get gd2 to execute commands in parallel on >>>>>> different >>>>>> > volumes, so potentially we could get this done in ~2 seconds. But >>>>>> quite a >>>>>> > few of the metrics need a mount or equivalent of a mount(glfs_init) >>>>>> to >>>>>> > collect different information like statfs, number of pending heals, >>>>>> quota >>>>>> > usage etc. This may lead to high memory usage as the size of the >>>>>> mounts tend >>>>>> > to be high. >>>>>> > >>>>>> >>>>>> I am not sure if starting from the "worst example" (it certainly is >>>>>> not) is a good place to start from. >>>>> >>>>> >>>>> I didn't understand your statement. Are you saying 600 volumes is a >>>>> worst example? >>>>> >>>>> >>>>>> That said, for any environment >>>>>> with that number of disposable volumes, what kind of metrics do >>>>>> actually make any sense/impact? >>>>>> >>>>> >>>>> Same metrics you track for long running volumes. It is just that the >>>>> way the metrics >>>>> are interpreted will be different. On a long running volume, you would >>>>> look at the metrics >>>>> and try to find why is the volume not giving performance as expected >>>>> in the last 1 hour. Where as >>>>> in this case, you would look at metrics and find the reason why >>>>> volumes that were >>>>> created and deleted in the last hour didn't give performance as >>>>> expected. >>>>> >>>>> >>>>>> >>>>>> > I wanted to seek suggestions from others on how to come to a >>>>>> conclusion >>>>>> > about which path to take and what problems to solve. >>>>>> > >>>>>> > I will be happy to raise github issues based on our conclusions on >>>>>> this mail >>>>>> > thread. >>>>>> > >>>>>> > -- >>>>>> > Pranith >>>>>> > >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> sankarshan mukhopadhyay >>>>>> <https://about.me/sankarshan.mukhopadhyay> >>>>>> _______________________________________________ >>>>>> Gluster-devel mailing list >>>>>> [email protected] >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Pranith >>>>> _______________________________________________ >>>>> Gluster-devel mailing list >>>>> [email protected] >>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>>> >>>> >>> >>> >>> -- >>> Pranith >>> >> > > > -- > Pranith > -- Pranith
_______________________________________________ Gluster-devel mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-devel
