It is configurable. Use the default as a notion of scale... 5s may become 30s; It won't be 5m. Also remember, this is the maximum, not minimum. A change to a watched kube resource will cause an immediate reconcile. The periodic, timer-based loop is just a fallback to catch state changes not represented in the kube API.
On Thu, Jul 26, 2018 at 12:57 AM Pranith Kumar Karampuri < [email protected]> wrote: > > > On Thu, Jul 26, 2018 at 9:59 AM, Pranith Kumar Karampuri < > [email protected]> wrote: > >> >> >> On Wed, Jul 25, 2018 at 10:48 PM, John Strunk <[email protected]> wrote: >> >>> I have not put together a list. Perhaps the following will help w/ the >>> context though... >>> >>> The "reconcile loop" of the operator will take the cluster CRs and >>> reconcile them against the actual cluster config. At the 20k foot level, >>> this amounts to something like determining there should be 8 gluster pods >>> running, and making the appropriate changes if that doesn't match reality. >>> In practical terms, the construction of this reconciliation loop can be >>> thought of as a set (array) of 3-tuples: [{should_act() -> bool, can_act -> >>> bool, action() -> ok, error}, {..., ..., ...}, ...] >>> >>> Each capability of the operator would be expressed as one of these >>> tuples. >>> should_act() : true if the action() should be taken >>> can_act() : true if the prerequisites for taking the action are met >>> action() : make the change. Only run if should && can. >>> (note that I believe should_act() and can_act() should not be separate >>> in the implementation, for reasons I'll not go into here) >>> >>> An example action might be "upgrade the container image for pod X". The >>> associated should_act would be triggered if the "image=" of the pod doesn't >>> match the desired "image=" in the operator CRs. The can_act evaluation >>> would be verifying that it's ok to do this... Thinking from the top of my >>> head: >>> - All volumes w/ a brick on this pod should be fully healed >>> - Sufficient cluster nodes should be up such that quorum is not lost >>> when this node goes down (does this matter?) >>> - The proposed image is compatible with the current version of the CSI >>> driver(s), the operator, and other gluster pods >>> - Probably some other stuff >>> The action() would update the "image=" in the Deployment to trigger the >>> rollout >>> >>> The idea is that queries would be made, both to the kube API and the >>> gluster cluster to verify the necessary preconditions for an action prior >>> to that action being invoked. There would obviously be commonality among >>> the preconditions for various actions, so the results should be fetched >>> exactly once per reconcile cycle. Also note, 1 cycle == at most 1 action() >>> due to the action changing the state of the system. >>> >>> Given that we haven't designed (or even listed) all the potential >>> action()s, I can't give you a list of everything to query. I guarantee >>> we'll need to know the up/down status, heal counts, and free capacity for >>> each brick and node. >>> >> >> Thanks for the detailed explanation. This helps. One question though, is >> 5 seconds a hard limit or is there a possibility to configure it? >> > > I put together an idea for reducing the mgmt operation latency involving > mounts at https://github.com/gluster/glusterd2/issues/1069, comments > welcome. > @john Still want to know if there exists a way to find if the hard limit > can be configured... > > >> >> >>> >>> -John >>> >>> On Wed, Jul 25, 2018 at 11:56 AM Pranith Kumar Karampuri < >>> [email protected]> wrote: >>> >>>> >>>> >>>> On Wed, Jul 25, 2018 at 8:17 PM, John Strunk <[email protected]> >>>> wrote: >>>> >>>>> To add an additional data point... The operator will need to regularly >>>>> reconcile the true state of the gluster cluster with the desired state >>>>> stored in kubernetes. This task will be required frequently (i.e., >>>>> operator-framework defaults to every 5s even if there are no config >>>>> changes). >>>>> The actual amount of data we will need to query from the cluster is >>>>> currently TBD and likely significantly affected by Heketi/GD1 vs. GD2 >>>>> choice. >>>>> >>>> >>>> Do we have any partial list of data we will gather? Just want to >>>> understand what this might entail already... >>>> >>>> >>>>> >>>>> -John >>>>> >>>>> >>>>> On Wed, Jul 25, 2018 at 5:45 AM Pranith Kumar Karampuri < >>>>> [email protected]> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Tue, Jul 24, 2018 at 10:10 PM, Sankarshan Mukhopadhyay < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> On Tue, Jul 24, 2018 at 9:48 PM, Pranith Kumar Karampuri >>>>>>> <[email protected]> wrote: >>>>>>> > hi, >>>>>>> > Quite a few commands to monitor gluster at the moment take >>>>>>> almost a >>>>>>> > second to give output. >>>>>>> >>>>>>> Is this at the (most) minimum recommended cluster size? >>>>>>> >>>>>> >>>>>> Yes, with a single volume with 3 bricks i.e. 3 nodes in cluster. >>>>>> >>>>>> >>>>>>> >>>>>>> > Some categories of these commands: >>>>>>> > 1) Any command that needs to do some sort of mount/glfs_init. >>>>>>> > Examples: 1) heal info family of commands 2) statfs to find >>>>>>> > space-availability etc (On my laptop replica 3 volume with all >>>>>>> local bricks, >>>>>>> > glfs_init takes 0.3 seconds on average) >>>>>>> > 2) glusterd commands that need to wait for the previous command to >>>>>>> unlock. >>>>>>> > If the previous command is something related to lvm snapshot which >>>>>>> takes >>>>>>> > quite a few seconds, it would be even more time consuming. >>>>>>> > >>>>>>> > Nowadays container workloads have hundreds of volumes if not >>>>>>> thousands. If >>>>>>> > we want to serve any monitoring solution at this scale (I have seen >>>>>>> > customers use upto 600 volumes at a time, it will only get bigger) >>>>>>> and lets >>>>>>> > say collecting metrics per volume takes 2 seconds per volume(Let >>>>>>> us take the >>>>>>> > worst example which has all major features enabled like >>>>>>> > snapshot/geo-rep/quota etc etc), that will mean that it will take >>>>>>> 20 minutes >>>>>>> > to collect metrics of the cluster with 600 volumes. What are the >>>>>>> ways in >>>>>>> > which we can make this number more manageable? I was initially >>>>>>> thinking may >>>>>>> > be it is possible to get gd2 to execute commands in parallel on >>>>>>> different >>>>>>> > volumes, so potentially we could get this done in ~2 seconds. But >>>>>>> quite a >>>>>>> > few of the metrics need a mount or equivalent of a >>>>>>> mount(glfs_init) to >>>>>>> > collect different information like statfs, number of pending >>>>>>> heals, quota >>>>>>> > usage etc. This may lead to high memory usage as the size of the >>>>>>> mounts tend >>>>>>> > to be high. >>>>>>> > >>>>>>> >>>>>>> I am not sure if starting from the "worst example" (it certainly is >>>>>>> not) is a good place to start from. >>>>>> >>>>>> >>>>>> I didn't understand your statement. Are you saying 600 volumes is a >>>>>> worst example? >>>>>> >>>>>> >>>>>>> That said, for any environment >>>>>>> with that number of disposable volumes, what kind of metrics do >>>>>>> actually make any sense/impact? >>>>>>> >>>>>> >>>>>> Same metrics you track for long running volumes. It is just that the >>>>>> way the metrics >>>>>> are interpreted will be different. On a long running volume, you >>>>>> would look at the metrics >>>>>> and try to find why is the volume not giving performance as expected >>>>>> in the last 1 hour. Where as >>>>>> in this case, you would look at metrics and find the reason why >>>>>> volumes that were >>>>>> created and deleted in the last hour didn't give performance as >>>>>> expected. >>>>>> >>>>>> >>>>>>> >>>>>>> > I wanted to seek suggestions from others on how to come to a >>>>>>> conclusion >>>>>>> > about which path to take and what problems to solve. >>>>>>> > >>>>>>> > I will be happy to raise github issues based on our conclusions on >>>>>>> this mail >>>>>>> > thread. >>>>>>> > >>>>>>> > -- >>>>>>> > Pranith >>>>>>> > >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> sankarshan mukhopadhyay >>>>>>> <https://about.me/sankarshan.mukhopadhyay> >>>>>>> _______________________________________________ >>>>>>> Gluster-devel mailing list >>>>>>> [email protected] >>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Pranith >>>>>> _______________________________________________ >>>>>> Gluster-devel mailing list >>>>>> [email protected] >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>>>> >>>>> >>>> >>>> >>>> -- >>>> Pranith >>>> >>> >> >> >> -- >> Pranith >> > > > > -- > Pranith >
_______________________________________________ Gluster-devel mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-devel
