Just saw a recently posted issue by Serkan Çoban that looks very similar: http://lists.gluster.org/pipermail/gluster-users/2018-April/033915.html
Sincerely, Artem -- Founder, Android Police <http://www.androidpolice.com>, APK Mirror <http://www.apkmirror.com/>, Illogical Robot LLC beerpla.net | +ArtemRussakovskii <https://plus.google.com/+ArtemRussakovskii> | @ArtemR <http://twitter.com/ArtemR> On Tue, Apr 17, 2018 at 9:44 PM, Artem Russakovskii <[email protected]> wrote: > Following up here on a related and very serious for us issue. > > I took down one of the 4 replicate gluster servers for maintenance today. > There are 2 gluster volumes totaling about 600GB. Not that much data. After > the server comes back online, it starts auto healing and pretty much all > operations on gluster freeze for many minutes. > > For example, I was trying to run an ls -alrt in a folder with 7300 files, > and it took a good 15-20 minutes before returning. > > During this time, I can see iostat show 100% utilization on the brick, > heal status takes many minutes to return, glusterfsd uses up tons of CPU (I > saw it spike to 600%). gluster already has massive performance issues for > me, but healing after a 4-hour downtime is on another level of bad perf. > > For example, this command took many minutes to run: > > gluster volume heal androidpolice_data3 info summary > Brick nexus2:/mnt/nexus2_block4/androidpolice_data3 > Status: Connected > Total Number of entries: 91 > Number of entries in heal pending: 90 > Number of entries in split-brain: 0 > Number of entries possibly healing: 1 > > Brick forge:/mnt/forge_block4/androidpolice_data3 > Status: Connected > Total Number of entries: 87 > Number of entries in heal pending: 86 > Number of entries in split-brain: 0 > Number of entries possibly healing: 1 > > Brick hive:/mnt/hive_block4/androidpolice_data3 > Status: Connected > Total Number of entries: 87 > Number of entries in heal pending: 86 > Number of entries in split-brain: 0 > Number of entries possibly healing: 1 > > Brick citadel:/mnt/citadel_block4/androidpolice_data3 > Status: Connected > Total Number of entries: 0 > Number of entries in heal pending: 0 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > > Statistics showed a diminishing number of failed heals: > ... > Ending time of crawl: Tue Apr 17 21:13:08 2018 > > Type of crawl: INDEX > No. of entries healed: 2 > No. of entries in split-brain: 0 > No. of heal failed entries: 102 > > Starting time of crawl: Tue Apr 17 21:13:09 2018 > > Ending time of crawl: Tue Apr 17 21:14:30 2018 > > Type of crawl: INDEX > No. of entries healed: 4 > No. of entries in split-brain: 0 > No. of heal failed entries: 91 > > Starting time of crawl: Tue Apr 17 21:14:31 2018 > > Ending time of crawl: Tue Apr 17 21:15:34 2018 > > Type of crawl: INDEX > No. of entries healed: 0 > No. of entries in split-brain: 0 > No. of heal failed entries: 88 > ... > > Eventually, everything heals and goes back to at least where the roof > isn't on fire anymore. > > The server stats and volume options were given in one of the previous > replies to this thread. > > Any ideas or things I could run and show the output of to help diagnose? > I'm also very open to working with someone on the team on a live debugging > session if there's interest. > > Thank you. > > > Sincerely, > Artem > > -- > Founder, Android Police <http://www.androidpolice.com>, APK Mirror > <http://www.apkmirror.com/>, Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR > <http://twitter.com/ArtemR> > > On Tue, Apr 10, 2018 at 9:56 AM, Artem Russakovskii <[email protected]> > wrote: > >> Hi Vlad, >> >> I actually saw that post already and even asked a question 4 days ago ( >> https://serverfault.com/questions/517775/glusterfs-direct- >> i-o-mode#comment1172497_540917). The accepted answer also seems to go >> against your suggestion to enable direct-io-mode as it says it should be >> disabled for better performance when used just for file accesses. >> >> It'd be great if someone from the Gluster team chimed in about this >> thread. >> >> >> Sincerely, >> Artem >> >> -- >> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >> <http://www.apkmirror.com/>, Illogical Robot LLC >> beerpla.net | +ArtemRussakovskii >> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >> <http://twitter.com/ArtemR> >> >> On Tue, Apr 10, 2018 at 7:01 AM, Vlad Kopylov <[email protected]> wrote: >> >>> Wish I knew or was able to get detailed description of those options >>> myself. >>> here is direct-io-mode https://serverfault.com/questi >>> ons/517775/glusterfs-direct-i-o-mode >>> Same as you I ran tests on a large volume of files, finding that main >>> delays are in attribute calls, ending up with those mount options to add >>> performance. >>> I discovered those options through basically googling this user list >>> with people sharing their tests. >>> Not sure I would share your optimism, and rather then going up I >>> downgraded to 3.12 and have no dir view issue now. Though I had to recreate >>> the cluster and had to re-add bricks with existing data. >>> >>> On Tue, Apr 10, 2018 at 1:47 AM, Artem Russakovskii <[email protected] >>> > wrote: >>> >>>> Hi Vlad, >>>> >>>> I'm using only localhost: mounts. >>>> >>>> Can you please explain what effect each option has on performance >>>> issues shown in my posts? "negative-timeout=10,attribute >>>> -timeout=30,fopen-keep-cache,direct-io-mode=enable,fetch-attempts=5" >>>> From what I remember, direct-io-mode=enable didn't make a difference in my >>>> tests, but I suppose I can try again. The explanations about direct-io-mode >>>> are quite confusing on the web in various guides, saying enabling it could >>>> make performance worse in some situations and better in others due to OS >>>> file cache. >>>> >>>> There are also these gluster volume settings, adding to the confusion: >>>> Option: performance.strict-o-direct >>>> Default Value: off >>>> Description: This option when set to off, ignores the O_DIRECT flag. >>>> >>>> Option: performance.nfs.strict-o-direct >>>> Default Value: off >>>> Description: This option when set to off, ignores the O_DIRECT flag. >>>> >>>> Re: 4.0. I moved to 4.0 after finding out that it fixes the >>>> disappearing dirs bug related to cluster.readdir-optimize if you remember ( >>>> http://lists.gluster.org/pipermail/gluster-users/2018-April/033830.html). >>>> I was already on 3.13 by then, and 4.0 resolved the issue. It's been stable >>>> for me so far, thankfully. >>>> >>>> >>>> Sincerely, >>>> Artem >>>> >>>> -- >>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>> beerpla.net | +ArtemRussakovskii >>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>> <http://twitter.com/ArtemR> >>>> >>>> On Mon, Apr 9, 2018 at 10:38 PM, Vlad Kopylov <[email protected]> >>>> wrote: >>>> >>>>> you definitely need mount options to /etc/fstab >>>>> use ones from here http://lists.gluster.org/piper >>>>> mail/gluster-users/2018-April/033811.html >>>>> >>>>> I went on with using local mounts to achieve performance as well >>>>> >>>>> Also, 3.12 or 3.10 branches would be preferable for production >>>>> >>>>> On Fri, Apr 6, 2018 at 4:12 AM, Artem Russakovskii < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi again, >>>>>> >>>>>> I'd like to expand on the performance issues and plead for help. >>>>>> Here's one case which shows these odd hiccups: https://i.imgur.com/C >>>>>> XBPjTK.gifv. >>>>>> >>>>>> In this GIF where I switch back and forth between copy operations on >>>>>> 2 servers, I'm copying a 10GB dir full of .apk and image files. >>>>>> >>>>>> On server "hive" I'm copying straight from the main disk to an >>>>>> attached volume block (xfs). As you can see, the transfers are relatively >>>>>> speedy and don't hiccup. >>>>>> On server "citadel" I'm copying the same set of data to a 4-replicate >>>>>> gluster which uses block storage as a brick. As you can see, performance >>>>>> is >>>>>> much worse, and there are frequent pauses for many seconds where nothing >>>>>> seems to be happening - just freezes. >>>>>> >>>>>> All 4 servers have the same specs, and all of them have performance >>>>>> issues with gluster and no such issues when raw xfs block storage is >>>>>> used. >>>>>> >>>>>> hive has long finished copying the data, while citadel is barely >>>>>> chugging along and is expected to take probably half an hour to an hour. >>>>>> I >>>>>> have over 1TB of data to migrate, at which point if we went live, I'm not >>>>>> even sure gluster would be able to keep up instead of bringing the >>>>>> machines >>>>>> and services down. >>>>>> >>>>>> >>>>>> >>>>>> Here's the cluster config, though it didn't seem to make any >>>>>> difference performance-wise before I applied the customizations vs after. >>>>>> >>>>>> Volume Name: apkmirror_data1 >>>>>> Type: Replicate >>>>>> Volume ID: 11ecee7e-d4f8-497a-9994-ceb144d6841e >>>>>> Status: Started >>>>>> Snapshot Count: 0 >>>>>> Number of Bricks: 1 x 4 = 4 >>>>>> Transport-type: tcp >>>>>> Bricks: >>>>>> Brick1: nexus2:/mnt/nexus2_block1/apkmirror_data1 >>>>>> Brick2: forge:/mnt/forge_block1/apkmirror_data1 >>>>>> Brick3: hive:/mnt/hive_block1/apkmirror_data1 >>>>>> Brick4: citadel:/mnt/citadel_block1/apkmirror_data1 >>>>>> Options Reconfigured: >>>>>> cluster.quorum-count: 1 >>>>>> cluster.quorum-type: fixed >>>>>> network.ping-timeout: 5 >>>>>> network.remote-dio: enable >>>>>> performance.rda-cache-limit: 256MB >>>>>> performance.readdir-ahead: on >>>>>> performance.parallel-readdir: on >>>>>> network.inode-lru-limit: 500000 >>>>>> performance.md-cache-timeout: 600 >>>>>> performance.cache-invalidation: on >>>>>> performance.stat-prefetch: on >>>>>> features.cache-invalidation-timeout: 600 >>>>>> features.cache-invalidation: on >>>>>> cluster.readdir-optimize: on >>>>>> performance.io-thread-count: 32 >>>>>> server.event-threads: 4 >>>>>> client.event-threads: 4 >>>>>> performance.read-ahead: off >>>>>> cluster.lookup-optimize: on >>>>>> performance.cache-size: 1GB >>>>>> cluster.self-heal-daemon: enable >>>>>> transport.address-family: inet >>>>>> nfs.disable: on >>>>>> performance.client-io-threads: on >>>>>> >>>>>> >>>>>> The mounts are done as follows in /etc/fstab: >>>>>> /dev/disk/by-id/scsi-0Linode_Volume_citadel_block1 >>>>>> /mnt/citadel_block1 xfs defaults 0 2 >>>>>> localhost:/apkmirror_data1 /mnt/apkmirror_data1 glusterfs >>>>>> defaults,_netdev 0 0 >>>>>> >>>>>> I'm really not sure if direct-io-mode mount tweaks would do anything >>>>>> here, what the value should be set to, and what it is by default. >>>>>> >>>>>> The OS is OpenSUSE 42.3, 64-bit. 80GB of RAM, 20 CPUs, hosted by >>>>>> Linode. >>>>>> >>>>>> I'd really appreciate any help in the matter. >>>>>> >>>>>> Thank you. >>>>>> >>>>>> >>>>>> Sincerely, >>>>>> Artem >>>>>> >>>>>> -- >>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>> beerpla.net | +ArtemRussakovskii >>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>> <http://twitter.com/ArtemR> >>>>>> >>>>>> On Thu, Apr 5, 2018 at 11:13 PM, Artem Russakovskii < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I'm trying to squeeze performance out of gluster on 4 80GB RAM >>>>>>> 20-CPU machines where Gluster runs on attached block storage (Linode) >>>>>>> in (4 >>>>>>> replicate bricks), and so far everything I tried results in sub-optimal >>>>>>> performance. >>>>>>> >>>>>>> There are many files - mostly images, several million - and many >>>>>>> operations take minutes, copying multiple files (even if they're small) >>>>>>> suddenly freezes up for seconds at a time, then continues, iostat >>>>>>> frequently shows large r_await and w_awaits with 100% utilization for >>>>>>> the >>>>>>> attached block device, etc. >>>>>>> >>>>>>> But anyway, there are many guides out there for small-file >>>>>>> performance improvements, but more explanation is needed, and I think >>>>>>> more >>>>>>> tweaks should be possible. >>>>>>> >>>>>>> My question today is about performance.cache-size. Is this a size of >>>>>>> cache in RAM? If so, how do I view the current cache size to see if it >>>>>>> gets >>>>>>> full and I should increase its size? Is it advisable to bump it up if I >>>>>>> have many tens of gigs of RAM free? >>>>>>> >>>>>>> >>>>>>> >>>>>>> More generally, in the last 2 months since I first started working >>>>>>> with gluster and set a production system live, I've been feeling >>>>>>> frustrated >>>>>>> because Gluster has a lot of poorly-documented and confusing options. I >>>>>>> really wish documentation could be improved with examples and better >>>>>>> explanations. >>>>>>> >>>>>>> Specifically, it'd be absolutely amazing if the docs offered a >>>>>>> strategy for setting each value and ways of determining more optimal >>>>>>> values. For example, for performance.cache-size, if it said something >>>>>>> like >>>>>>> "run command abc to see your current cache size, and if it's hurting, up >>>>>>> it, but be aware that it's limited by RAM," it'd be already a huge >>>>>>> improvement to the docs. And so on with other options. >>>>>>> >>>>>>> >>>>>>> >>>>>>> The gluster team is quite helpful on this mailing list, but in a >>>>>>> reactive rather than proactive way. Perhaps it's tunnel vision once >>>>>>> you've >>>>>>> worked on a project for so long where less technical explanations and >>>>>>> even >>>>>>> proper documentation of options takes a back seat, but I encourage you >>>>>>> to >>>>>>> be more proactive about helping us understand and optimize Gluster. >>>>>>> >>>>>>> Thank you. >>>>>>> >>>>>>> Sincerely, >>>>>>> Artem >>>>>>> >>>>>>> -- >>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>> <http://twitter.com/ArtemR> >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> [email protected] >>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>> >>>>> >>>> >>> >> >
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
