On Wed, Jul 25, 2018 at 8:17 PM, John Strunk <[email protected]> wrote:
> To add an additional data point... The operator will need to regularly > reconcile the true state of the gluster cluster with the desired state > stored in kubernetes. This task will be required frequently (i.e., > operator-framework defaults to every 5s even if there are no config > changes). > The actual amount of data we will need to query from the cluster is > currently TBD and likely significantly affected by Heketi/GD1 vs. GD2 > choice. > Do we have any partial list of data we will gather? Just want to understand what this might entail already... > > -John > > > On Wed, Jul 25, 2018 at 5:45 AM Pranith Kumar Karampuri < > [email protected]> wrote: > >> >> >> On Tue, Jul 24, 2018 at 10:10 PM, Sankarshan Mukhopadhyay < >> [email protected]> wrote: >> >>> On Tue, Jul 24, 2018 at 9:48 PM, Pranith Kumar Karampuri >>> <[email protected]> wrote: >>> > hi, >>> > Quite a few commands to monitor gluster at the moment take >>> almost a >>> > second to give output. >>> >>> Is this at the (most) minimum recommended cluster size? >>> >> >> Yes, with a single volume with 3 bricks i.e. 3 nodes in cluster. >> >> >>> >>> > Some categories of these commands: >>> > 1) Any command that needs to do some sort of mount/glfs_init. >>> > Examples: 1) heal info family of commands 2) statfs to find >>> > space-availability etc (On my laptop replica 3 volume with all local >>> bricks, >>> > glfs_init takes 0.3 seconds on average) >>> > 2) glusterd commands that need to wait for the previous command to >>> unlock. >>> > If the previous command is something related to lvm snapshot which >>> takes >>> > quite a few seconds, it would be even more time consuming. >>> > >>> > Nowadays container workloads have hundreds of volumes if not >>> thousands. If >>> > we want to serve any monitoring solution at this scale (I have seen >>> > customers use upto 600 volumes at a time, it will only get bigger) and >>> lets >>> > say collecting metrics per volume takes 2 seconds per volume(Let us >>> take the >>> > worst example which has all major features enabled like >>> > snapshot/geo-rep/quota etc etc), that will mean that it will take 20 >>> minutes >>> > to collect metrics of the cluster with 600 volumes. What are the ways >>> in >>> > which we can make this number more manageable? I was initially >>> thinking may >>> > be it is possible to get gd2 to execute commands in parallel on >>> different >>> > volumes, so potentially we could get this done in ~2 seconds. But >>> quite a >>> > few of the metrics need a mount or equivalent of a mount(glfs_init) to >>> > collect different information like statfs, number of pending heals, >>> quota >>> > usage etc. This may lead to high memory usage as the size of the >>> mounts tend >>> > to be high. >>> > >>> >>> I am not sure if starting from the "worst example" (it certainly is >>> not) is a good place to start from. >> >> >> I didn't understand your statement. Are you saying 600 volumes is a worst >> example? >> >> >>> That said, for any environment >>> with that number of disposable volumes, what kind of metrics do >>> actually make any sense/impact? >>> >> >> Same metrics you track for long running volumes. It is just that the way >> the metrics >> are interpreted will be different. On a long running volume, you would >> look at the metrics >> and try to find why is the volume not giving performance as expected in >> the last 1 hour. Where as >> in this case, you would look at metrics and find the reason why volumes >> that were >> created and deleted in the last hour didn't give performance as expected. >> >> >>> >>> > I wanted to seek suggestions from others on how to come to a conclusion >>> > about which path to take and what problems to solve. >>> > >>> > I will be happy to raise github issues based on our conclusions on >>> this mail >>> > thread. >>> > >>> > -- >>> > Pranith >>> > >>> >>> >>> >>> >>> >>> -- >>> sankarshan mukhopadhyay >>> <https://about.me/sankarshan.mukhopadhyay> >>> _______________________________________________ >>> Gluster-devel mailing list >>> [email protected] >>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>> >> >> >> >> -- >> Pranith >> _______________________________________________ >> Gluster-devel mailing list >> [email protected] >> https://lists.gluster.org/mailman/listinfo/gluster-devel > > -- Pranith
_______________________________________________ Gluster-devel mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-devel
