On Thu, Jul 26, 2018 at 8:30 PM, John Strunk <[email protected]> wrote:
> It is configurable. Use the default as a notion of scale... 5s may become > 30s; It won't be 5m. > Also remember, this is the maximum, not minimum. A change to a watched > kube resource will cause an immediate reconcile. The periodic, timer-based > loop is just a fallback to catch state changes not represented in the kube > API. > Cool, got it. Let us wait if anyone sees any objections to the solution proposed. Request everyone to comment if they see any issues with https://github.com/gluster/glusterd2/issues/1069 I think EC/AFR/Quota components will definitely be affected with this approach. CCing them. Please feel free to CC anyone who works on commands that require a mount to give status. > > On Thu, Jul 26, 2018 at 12:57 AM Pranith Kumar Karampuri < > [email protected]> wrote: > >> >> >> On Thu, Jul 26, 2018 at 9:59 AM, Pranith Kumar Karampuri < >> [email protected]> wrote: >> >>> >>> >>> On Wed, Jul 25, 2018 at 10:48 PM, John Strunk <[email protected]> >>> wrote: >>> >>>> I have not put together a list. Perhaps the following will help w/ the >>>> context though... >>>> >>>> The "reconcile loop" of the operator will take the cluster CRs and >>>> reconcile them against the actual cluster config. At the 20k foot level, >>>> this amounts to something like determining there should be 8 gluster pods >>>> running, and making the appropriate changes if that doesn't match reality. >>>> In practical terms, the construction of this reconciliation loop can be >>>> thought of as a set (array) of 3-tuples: [{should_act() -> bool, can_act -> >>>> bool, action() -> ok, error}, {..., ..., ...}, ...] >>>> >>>> Each capability of the operator would be expressed as one of these >>>> tuples. >>>> should_act() : true if the action() should be taken >>>> can_act() : true if the prerequisites for taking the action are met >>>> action() : make the change. Only run if should && can. >>>> (note that I believe should_act() and can_act() should not be separate >>>> in the implementation, for reasons I'll not go into here) >>>> >>>> An example action might be "upgrade the container image for pod X". The >>>> associated should_act would be triggered if the "image=" of the pod doesn't >>>> match the desired "image=" in the operator CRs. The can_act evaluation >>>> would be verifying that it's ok to do this... Thinking from the top of my >>>> head: >>>> - All volumes w/ a brick on this pod should be fully healed >>>> - Sufficient cluster nodes should be up such that quorum is not lost >>>> when this node goes down (does this matter?) >>>> - The proposed image is compatible with the current version of the CSI >>>> driver(s), the operator, and other gluster pods >>>> - Probably some other stuff >>>> The action() would update the "image=" in the Deployment to trigger the >>>> rollout >>>> >>>> The idea is that queries would be made, both to the kube API and the >>>> gluster cluster to verify the necessary preconditions for an action prior >>>> to that action being invoked. There would obviously be commonality among >>>> the preconditions for various actions, so the results should be fetched >>>> exactly once per reconcile cycle. Also note, 1 cycle == at most 1 action() >>>> due to the action changing the state of the system. >>>> >>>> Given that we haven't designed (or even listed) all the potential >>>> action()s, I can't give you a list of everything to query. I guarantee >>>> we'll need to know the up/down status, heal counts, and free capacity for >>>> each brick and node. >>>> >>> >>> Thanks for the detailed explanation. This helps. One question though, is >>> 5 seconds a hard limit or is there a possibility to configure it? >>> >> >> I put together an idea for reducing the mgmt operation latency involving >> mounts at https://github.com/gluster/glusterd2/issues/1069, comments >> welcome. >> @john Still want to know if there exists a way to find if the hard limit >> can be configured... >> >> >>> >>> >>>> >>>> -John >>>> >>>> On Wed, Jul 25, 2018 at 11:56 AM Pranith Kumar Karampuri < >>>> [email protected]> wrote: >>>> >>>>> >>>>> >>>>> On Wed, Jul 25, 2018 at 8:17 PM, John Strunk <[email protected]> >>>>> wrote: >>>>> >>>>>> To add an additional data point... The operator will need to >>>>>> regularly reconcile the true state of the gluster cluster with the >>>>>> desired >>>>>> state stored in kubernetes. This task will be required frequently (i.e., >>>>>> operator-framework defaults to every 5s even if there are no config >>>>>> changes). >>>>>> The actual amount of data we will need to query from the cluster is >>>>>> currently TBD and likely significantly affected by Heketi/GD1 vs. GD2 >>>>>> choice. >>>>>> >>>>> >>>>> Do we have any partial list of data we will gather? Just want to >>>>> understand what this might entail already... >>>>> >>>>> >>>>>> >>>>>> -John >>>>>> >>>>>> >>>>>> On Wed, Jul 25, 2018 at 5:45 AM Pranith Kumar Karampuri < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Jul 24, 2018 at 10:10 PM, Sankarshan Mukhopadhyay < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> On Tue, Jul 24, 2018 at 9:48 PM, Pranith Kumar Karampuri >>>>>>>> <[email protected]> wrote: >>>>>>>> > hi, >>>>>>>> > Quite a few commands to monitor gluster at the moment take >>>>>>>> almost a >>>>>>>> > second to give output. >>>>>>>> >>>>>>>> Is this at the (most) minimum recommended cluster size? >>>>>>>> >>>>>>> >>>>>>> Yes, with a single volume with 3 bricks i.e. 3 nodes in cluster. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> > Some categories of these commands: >>>>>>>> > 1) Any command that needs to do some sort of mount/glfs_init. >>>>>>>> > Examples: 1) heal info family of commands 2) statfs to find >>>>>>>> > space-availability etc (On my laptop replica 3 volume with all >>>>>>>> local bricks, >>>>>>>> > glfs_init takes 0.3 seconds on average) >>>>>>>> > 2) glusterd commands that need to wait for the previous command >>>>>>>> to unlock. >>>>>>>> > If the previous command is something related to lvm snapshot >>>>>>>> which takes >>>>>>>> > quite a few seconds, it would be even more time consuming. >>>>>>>> > >>>>>>>> > Nowadays container workloads have hundreds of volumes if not >>>>>>>> thousands. If >>>>>>>> > we want to serve any monitoring solution at this scale (I have >>>>>>>> seen >>>>>>>> > customers use upto 600 volumes at a time, it will only get >>>>>>>> bigger) and lets >>>>>>>> > say collecting metrics per volume takes 2 seconds per volume(Let >>>>>>>> us take the >>>>>>>> > worst example which has all major features enabled like >>>>>>>> > snapshot/geo-rep/quota etc etc), that will mean that it will take >>>>>>>> 20 minutes >>>>>>>> > to collect metrics of the cluster with 600 volumes. What are the >>>>>>>> ways in >>>>>>>> > which we can make this number more manageable? I was initially >>>>>>>> thinking may >>>>>>>> > be it is possible to get gd2 to execute commands in parallel on >>>>>>>> different >>>>>>>> > volumes, so potentially we could get this done in ~2 seconds. But >>>>>>>> quite a >>>>>>>> > few of the metrics need a mount or equivalent of a >>>>>>>> mount(glfs_init) to >>>>>>>> > collect different information like statfs, number of pending >>>>>>>> heals, quota >>>>>>>> > usage etc. This may lead to high memory usage as the size of the >>>>>>>> mounts tend >>>>>>>> > to be high. >>>>>>>> > >>>>>>>> >>>>>>>> I am not sure if starting from the "worst example" (it certainly is >>>>>>>> not) is a good place to start from. >>>>>>> >>>>>>> >>>>>>> I didn't understand your statement. Are you saying 600 volumes is a >>>>>>> worst example? >>>>>>> >>>>>>> >>>>>>>> That said, for any environment >>>>>>>> with that number of disposable volumes, what kind of metrics do >>>>>>>> actually make any sense/impact? >>>>>>>> >>>>>>> >>>>>>> Same metrics you track for long running volumes. It is just that the >>>>>>> way the metrics >>>>>>> are interpreted will be different. On a long running volume, you >>>>>>> would look at the metrics >>>>>>> and try to find why is the volume not giving performance as expected >>>>>>> in the last 1 hour. Where as >>>>>>> in this case, you would look at metrics and find the reason why >>>>>>> volumes that were >>>>>>> created and deleted in the last hour didn't give performance as >>>>>>> expected. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> > I wanted to seek suggestions from others on how to come to a >>>>>>>> conclusion >>>>>>>> > about which path to take and what problems to solve. >>>>>>>> > >>>>>>>> > I will be happy to raise github issues based on our conclusions >>>>>>>> on this mail >>>>>>>> > thread. >>>>>>>> > >>>>>>>> > -- >>>>>>>> > Pranith >>>>>>>> > >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> sankarshan mukhopadhyay >>>>>>>> <https://about.me/sankarshan.mukhopadhyay> >>>>>>>> _______________________________________________ >>>>>>>> Gluster-devel mailing list >>>>>>>> [email protected] >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Pranith >>>>>>> _______________________________________________ >>>>>>> Gluster-devel mailing list >>>>>>> [email protected] >>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Pranith >>>>> >>>> >>> >>> >>> -- >>> Pranith >>> >> >> >> >> -- >> Pranith >> > -- Pranith
_______________________________________________ Gluster-devel mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-devel
