Following up here on a related and very serious for us issue.
I took down one of the 4 replicate gluster servers for
maintenance today. There are 2 gluster volumes totaling about
600GB. Not that much data. After the server comes back online, it
starts auto healing and pretty much all operations on gluster
freeze for many minutes.
For example, I was trying to run an ls -alrt in a folder with
7300 files, and it took a good 15-20 minutes before returning.
During this time, I can see iostat show 100% utilization on the
brick, heal status takes many minutes to return, glusterfsd uses
up tons of CPU (I saw it spike to 600%). gluster already has
massive performance issues for me, but healing after a 4-hour
downtime is on another level of bad perf.
For example, this command took many minutes to run:
gluster volume heal androidpolice_data3 info summary
Brick nexus2:/mnt/nexus2_block4/androidpolice_data3
Status: Connected
Total Number of entries: 91
Number of entries in heal pending: 90
Number of entries in split-brain: 0
Number of entries possibly healing: 1
Brick forge:/mnt/forge_block4/androidpolice_data3
Status: Connected
Total Number of entries: 87
Number of entries in heal pending: 86
Number of entries in split-brain: 0
Number of entries possibly healing: 1
Brick hive:/mnt/hive_block4/androidpolice_data3
Status: Connected
Total Number of entries: 87
Number of entries in heal pending: 86
Number of entries in split-brain: 0
Number of entries possibly healing: 1
Brick citadel:/mnt/citadel_block4/androidpolice_data3
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0
Statistics showed a diminishing number of failed heals:
...
Ending time of crawl: Tue Apr 17 21:13:08 2018
Type of crawl: INDEX
No. of entries healed: 2
No. of entries in split-brain: 0
No. of heal failed entries: 102
Starting time of crawl: Tue Apr 17 21:13:09 2018
Ending time of crawl: Tue Apr 17 21:14:30 2018
Type of crawl: INDEX
No. of entries healed: 4
No. of entries in split-brain: 0
No. of heal failed entries: 91
Starting time of crawl: Tue Apr 17 21:14:31 2018
Ending time of crawl: Tue Apr 17 21:15:34 2018
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 88
...
Eventually, everything heals and goes back to at least where the
roof isn't on fire anymore.
The server stats and volume options were given in one of the
previous replies to this thread.
Any ideas or things I could run and show the output of to help
diagnose? I'm also very open to working with someone on the team
on a live debugging session if there's interest.
Thank you.
Sincerely,
Artem
--
Founder, Android Police <http://www.androidpolice.com>, APK
Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
<http://twitter.com/ArtemR>
On Tue, Apr 10, 2018 at 9:56 AM, Artem Russakovskii
<archon...@gmail.com <mailto:archon...@gmail.com>> wrote:
Hi Vlad,
I actually saw that post already and even asked a question 4
days ago
(https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode#comment1172497_540917
<https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode#comment1172497_540917>).
The accepted answer also seems to go against your suggestion
to enable direct-io-mode as it says it should be disabled for
better performance when used just for file accesses.
It'd be great if someone from the Gluster team chimed in
about this thread.
Sincerely,
Artem
--
Founder, Android Police <http://www.androidpolice.com>, APK
Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
<http://twitter.com/ArtemR>
On Tue, Apr 10, 2018 at 7:01 AM, Vlad Kopylov
<vladk...@gmail.com <mailto:vladk...@gmail.com>> wrote:
Wish I knew or was able to get detailed description of
those options myself.
here is direct-io-mode
https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode
<https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode>
Same as you I ran tests on a large volume of files,
finding that main delays are in attribute calls, ending
up with those mount options to add performance.
I discovered those options through basically googling
this user list with people sharing their tests.
Not sure I would share your optimism, and rather then
going up I downgraded to 3.12 and have no dir view issue
now. Though I had to recreate the cluster and had to
re-add bricks with existing data.
On Tue, Apr 10, 2018 at 1:47 AM, Artem Russakovskii
<archon...@gmail.com <mailto:archon...@gmail.com>> wrote:
Hi Vlad,
I'm using only localhost: mounts.
Can you please explain what effect each option has on
performance issues shown in my posts?
"negative-timeout=10,attribute-timeout=30,fopen-keep-cache,direct-io-mode=enable,fetch-attempts=5"
From what I remember, direct-io-mode=enable didn't
make a difference in my tests, but I suppose I can
try again. The explanations about direct-io-mode are
quite confusing on the web in various guides, saying
enabling it could make performance worse in some
situations and better in others due to OS file cache.
There are also these gluster volume settings, adding
to the confusion:
Option: performance.strict-o-direct
Default Value: off
Description: This option when set to off, ignores the
O_DIRECT flag.
Option: performance.nfs.strict-o-direct
Default Value: off
Description: This option when set to off, ignores the
O_DIRECT flag.
Re: 4.0. I moved to 4.0 after finding out that it
fixes the disappearing dirs bug related to
cluster.readdir-optimize if you remember
(http://lists.gluster.org/pipermail/gluster-users/2018-April/033830.html
<http://lists.gluster.org/pipermail/gluster-users/2018-April/033830.html>).
I was already on 3.13 by then, and 4.0 resolved the
issue. It's been stable for me so far, thankfully.
Sincerely,
Artem
--
Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net <http://beerpla.net/> |
+ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii> |
@ArtemR <http://twitter.com/ArtemR>
On Mon, Apr 9, 2018 at 10:38 PM, Vlad Kopylov
<vladk...@gmail.com <mailto:vladk...@gmail.com>> wrote:
you definitely need mount options to /etc/fstab
use ones from here
http://lists.gluster.org/pipermail/gluster-users/2018-April/033811.html
<http://lists.gluster.org/pipermail/gluster-users/2018-April/033811.html>
I went on with using local mounts to achieve
performance as well
Also, 3.12 or 3.10 branches would be preferable
for production
On Fri, Apr 6, 2018 at 4:12 AM, Artem
Russakovskii <archon...@gmail.com
<mailto:archon...@gmail.com>> wrote:
Hi again,
I'd like to expand on the performance issues
and plead for help. Here's one case which
shows these odd hiccups:
https://i.imgur.com/CXBPjTK.gifv
<https://i.imgur.com/CXBPjTK.gifv>.
In this GIF where I switch back and forth
between copy operations on 2 servers, I'm
copying a 10GB dir full of .apk and image files.
On server "hive" I'm copying straight from
the main disk to an attached volume block
(xfs). As you can see, the transfers are
relatively speedy and don't hiccup.
On server "citadel" I'm copying the same set
of data to a 4-replicate gluster which uses
block storage as a brick. As you can see,
performance is much worse, and there are
frequent pauses for many seconds where
nothing seems to be happening - just freezes.
All 4 servers have the same specs, and all of
them have performance issues with gluster and
no such issues when raw xfs block storage is
used.
hive has long finished copying the data,
while citadel is barely chugging along and is
expected to take probably half an hour to an
hour. I have over 1TB of data to migrate, at
which point if we went live, I'm not even
sure gluster would be able to keep up instead
of bringing the machines and services down.
Here's the cluster config, though it didn't
seem to make any difference performance-wise
before I applied the customizations vs after.
Volume Name: apkmirror_data1
Type: Replicate
Volume ID: 11ecee7e-d4f8-497a-9994-ceb144d6841e
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: nexus2:/mnt/nexus2_block1/apkmirror_data1
Brick2: forge:/mnt/forge_block1/apkmirror_data1
Brick3: hive:/mnt/hive_block1/apkmirror_data1
Brick4:
citadel:/mnt/citadel_block1/apkmirror_data1
Options Reconfigured:
cluster.quorum-count: 1
cluster.quorum-type: fixed
network.ping-timeout: 5
network.remote-dio: enable
performance.rda-cache-limit: 256MB
performance.readdir-ahead: on
performance.parallel-readdir: on
network.inode-lru-limit: 500000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.io-thread-count: 32
server.event-threads: 4
client.event-threads: 4
performance.read-ahead: off
cluster.lookup-optimize: on
performance.cache-size: 1GB
cluster.self-heal-daemon: enable
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on
The mounts are done as follows in /etc/fstab:
/dev/disk/by-id/scsi-0Linode_Volume_citadel_block1
/mnt/citadel_block1 xfs defaults 0 2
localhost:/apkmirror_data1
/mnt/apkmirror_data1 glusterfs
defaults,_netdev 0 0
I'm really not sure if direct-io-mode mount
tweaks would do anything here, what the value
should be set to, and what it is by default.
The OS is OpenSUSE 42.3, 64-bit. 80GB of RAM,
20 CPUs, hosted by Linode.
I'd really appreciate any help in the matter.
Thank you.
Sincerely,
Artem
--
Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net <http://beerpla.net/> |
+ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii>
| @ArtemR <http://twitter.com/ArtemR>
On Thu, Apr 5, 2018 at 11:13 PM, Artem
Russakovskii <archon...@gmail.com
<mailto:archon...@gmail.com>> wrote:
Hi,
I'm trying to squeeze performance out of
gluster on 4 80GB RAM 20-CPU machines
where Gluster runs on attached block
storage (Linode) in (4 replicate bricks),
and so far everything I tried results in
sub-optimal performance.
There are many files - mostly images,
several million - and many operations
take minutes, copying multiple files
(even if they're small) suddenly freezes
up for seconds at a time, then continues,
iostat frequently shows large r_await and
w_awaits with 100% utilization for the
attached block device, etc.
But anyway, there are many guides out
there for small-file performance
improvements, but more explanation is
needed, and I think more tweaks should be
possible.
My question today is
about performance.cache-size. Is this a
size of cache in RAM? If so, how do I
view the current cache size to see if it
gets full and I should increase its size?
Is it advisable to bump it up if I have
many tens of gigs of RAM free?
More generally, in the last 2 months
since I first started working with
gluster and set a production system live,
I've been feeling frustrated because
Gluster has a lot of poorly-documented
and confusing options. I really wish
documentation could be improved with
examples and better explanations.
Specifically, it'd be absolutely amazing
if the docs offered a strategy for
setting each value and ways of
determining more optimal values. For
example, for performance.cache-size, if
it said something like "run command abc
to see your current cache size, and if
it's hurting, up it, but be aware that
it's limited by RAM," it'd be already a
huge improvement to the docs. And so on
with other options.
The gluster team is quite helpful on this
mailing list, but in a reactive rather
than proactive way. Perhaps it's tunnel
vision once you've worked on a project
for so long where less technical
explanations and even proper
documentation of options takes a back
seat, but I encourage you to be more
proactive about helping us understand and
optimize Gluster.
Thank you.
Sincerely,
Artem
--
Founder, Android Police
<http://www.androidpolice.com>, APK
Mirror <http://www.apkmirror.com/>,
Illogical Robot LLC
beerpla.net <http://beerpla.net/> |
+ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii>
| @ArtemR <http://twitter.com/ArtemR>
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
<mailto:Gluster-users@gluster.org>
http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>