Re: [Gluster-users] poor performance

Joe Julian Wed, 14 Dec 2022 07:44:21 -0800

PHP is not a good filesystem user. I've written about this a while back: 
https://joejulian.name/post/optimizing-web-performance-with-glusterfs/


On December 14, 2022 6:16:54 AM PST, Jaco Kroon <j...@uls.co.za> wrote:
>Hi Peter,
>
>Yes, we could.  but with ~1000 vhosts that gets extremely cumbersome to 
>maintain and get clients to be able to manage their own stuff.  Essentially 
>except if the htdocs/ folder is on a single filesystem we're going to need to 
>get involved with each and every update, which isn't feasible.  Then I'd 
>rather partition the vhosts such that half runs on one server and the other 
>half on the other server and risk downtime.
>
>Our experience indicates that the slow part is in fact not the execution of 
>the php code but for php to locate the files.  It tries a bunch of folders 
>with stat() and/or open() and gets the ordering wrong, resulting numerous 
>ENOENT errors before hitting the right locations, after which it actually does 
>quite well.  On code I wrote which does NOT suffer this problem quite as badly 
>as wordpress we find that from a local filesystem we get 200ms on full 
>processing (idle system, nvme physical disk, although I doubt this matters 
>since the fs layer should have most of this cached in RAM anyway) vs 300ms on 
>top of glusterfs.  The bricks barely ever goes to disk (fs layer caching) 
>according to the system stats we gathered.
>
>How does big hosting entities like wordpress.org (iirc) deal with this?  
>Because honestly, I doubt they do single-server setups. Then again, I reckon 
>that if you ONLY host wordpress (based on experience) it's possible to have a 
>single master copy of wordpress on each server, with a lsync'ed themes/ folder 
>for each vhost and a shared (glusterfs) uploads folder.  Enters things like 
>wordfence that insists on being able to write to alternative locations.
>
>Anyway, barring using glusterfs we can certainly come up with solutions, which 
>may even include having *some* sites run on the shared setup, and others on 
>single-host, possibly with lsync keeping a "semi hot standby" up to date with 
>something like lsync.  That does get complex though.
>
>Our ideal solution remains a fairly performant clustered filesystem such as 
>glusterfs (with which we have a lot of experience, including using it for 
>large email clusters where it's performance is excellent, but I would have 
>LOVED inotify support).  With nl-cache the performance is adequate, however, 
>the cache-invalidation doesn't seem to function properly.  Which I believe can 
>be solved, either by fixing settings, or by fixing code bugs.  Basically 
>whenver a file is modified or a new file is created, clients should be alerted 
>in order to invalidate cache. Since this cluster is mostly-read, some write, 
>and there is only two clients, this should be perfectly manageable, and there 
>seems to be hints of this in the gluster volume options already:
>
># gluster volume get volname all | grep invalid
>performance.quick-read-cache-invalidation false (DEFAULT)
>performance.ctime-invalidation           false (DEFAULT)
>performance.cache-invalidation on
>performance.global-cache-invalidation    true (DEFAULT)
>features.cache-invalidation on
>features.cache-invalidation-timeout 600
>
>Kind Regards,
>Jaco
>
>On 2022/12/14 14:56, Péter Károly JUHÁSZ wrote:
>
>> We did this with WordPress too. It uses a tons of static files, executing 
>> them is the slow part. You can rsync them and use the upload dir from 
>> glusterfs.
>> 
>> Jaco Kroon <j...@uls.co.za> 于 2022年12月14日周三 13:20写道：
>> 
>>     Hi,
>> 
>>     The problem is files generated by wordpress, and uploads etc ...
>>     so copying them to frontend hosts whilst making perfect sense
>>     assumes I have control over the code to not write to the local
>>     front-end, else we could have relied on something like lsync.
>> 
>>     As it stands, performance is acceptable with nl-cache enabled, but
>>     the fact that we get those ENOENT errors are highly problematic.
>> 
>> 
>>     Kind Regards,
>>     Jaco Kroon
>> 
>> 
>>     n 2022/12/14 14:04, Péter Károly JUHÁSZ wrote:
>> 
>>>     When we used glusterfs for websites, we copied the web dir from
>>>     gluster to local on frontend boots, then served it from there.
>>> 
>>>     Jaco Kroon <j...@uls.co.za> 于 2022年12月14日周三 12:49写道：
>>> 
>>>         Hi All,
>>> 
>>>         We've got a glusterfs cluster that houses some php web sites.
>>> 
>>>         This is generally considered a bad idea and we can see why.
>>> 
>>>         With performance.nl-cache on it actually turns out to be very
>>>         reasonable, however, with this turned of performance is
>>>         roughly 5x
>>>         worse.  meaning a request that would take sub 500ms now takes
>>>         2500ms.
>>>         In other cases we see far, far worse cases, eg, with nl-cache
>>>         takes
>>>         ~1500ms, without takes ~30s (20x worse).
>>> 
>>>         So why not use nl-cache?  Well, it results in readdir
>>>         reporting files
>>>         which then fails to open with ENOENT.  The cache also never
>>>         clears even
>>>         though the configuration says nl-cache entries should only be
>>>         cached for
>>>         60s.  Even for "ls -lah" in affected folders you'll notice
>>>         ???? mark
>>>         entries for attributes on files.  If this recovers in a
>>>         reasonable time
>>>         (say, a few seconds, sure).
>>> 
>>>         # gluster volume info
>>>         Type: Replicate
>>>         Volume ID: cbe08331-8b83-41ac-b56d-88ef30c0f5c7
>>>         Status: Started
>>>         Snapshot Count: 0
>>>         Number of Bricks: 1 x 2 = 2
>>>         Transport-type: tcp
>>>         Options Reconfigured:
>>>         performance.nl-cache: on
>>>         cluster.readdir-optimize: on
>>>         config.client-threads: 2
>>>         config.brick-threads: 4
>>>         config.global-threading: on
>>>         performance.iot-pass-through: on
>>>         storage.fips-mode-rchecksum: on
>>>         cluster.granular-entry-heal: enable
>>>         cluster.data-self-heal-algorithm: full
>>>         cluster.locking-scheme: granular
>>>         client.event-threads: 2
>>>         server.event-threads: 2
>>>         transport.address-family: inet
>>>         nfs.disable: on
>>>         cluster.metadata-self-heal: off
>>>         cluster.entry-self-heal: off
>>>         cluster.data-self-heal: off
>>>         cluster.self-heal-daemon: on
>>>         server.allow-insecure: on
>>>         features.ctime: off
>>>         performance.io-cache: on
>>>         performance.cache-invalidation: on
>>>         features.cache-invalidation: on
>>>         performance.qr-cache-timeout: 600
>>>         features.cache-invalidation-timeout: 600
>>>         performance.io-cache-size: 128MB
>>>         performance.cache-size: 128MB
>>> 
>>>         Are there any other recommendations short of abandon all hope of
>>>         redundancy and to revert to a single-server setup (for the
>>>         web code at
>>>         least).  Currently the cost of the redundancy seems to
>>>         outweigh the benefit.
>>> 
>>>         Glusterfs version 10.2.  With patch for --inode-table-size,
>>>         mounts
>>>         happen with:
>>> 
>>>         /usr/sbin/glusterfs --acl --reader-thread-count=2
>>>         --lru-limit=524288
>>>         --inode-table-size=524288 --invalidate-limit=16
>>>         --background-qlen=32
>>>         --fuse-mountopts=nodev,nosuid,noexec,noatime --process-name fuse
>>>         --volfile-server=127.0.0.1 --volfile-id=gv_home
>>>         --fuse-mountopts=nodev,nosuid,noexec,noatime /home
>>> 
>>>         Kind Regards,
>>>         Jaco
>>> 
>>>         ________
>>> 
>>> 
>>> 
>>>         Community Meeting Calendar:
>>> 
>>>         Schedule -
>>>         Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>         Bridge: https://meet.google.com/cpu-eiue-hvk
>>>         Gluster-users mailing list
>>>         Gluster-users@gluster.org
>>>         https://lists.gluster.org/mailman/listinfo/gluster-users
>>> 
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] poor performance

Reply via email to