Following up here on a related and very serious for us
issue.
I took down one of the 4 replicate gluster servers for
maintenance today. There are 2 gluster volumes totaling
about 600GB. Not that much data. After the server comes
back online, it starts auto healing and pretty much all
operations on gluster freeze for many minutes.
For example, I was trying to run an ls -alrt in a folder
with 7300 files, and it took a good 15-20 minutes before
returning.
During this time, I can see iostat show 100% utilization
on the brick, heal status takes many minutes to return,
glusterfsd uses up tons of CPU (I saw it spike to 600%).
gluster already has massive performance issues for me,
but healing after a 4-hour downtime is on another level
of bad perf.
For example, this command took many minutes to run:
gluster volume heal androidpolice_data3 info summary
Brick nexus2:/mnt/nexus2_block4/androidpolice_data3
Status: Connected
Total Number of entries: 91
Number of entries in heal pending: 90
Number of entries in split-brain: 0
Number of entries possibly healing: 1
Brick forge:/mnt/forge_block4/androidpolice_data3
Status: Connected
Total Number of entries: 87
Number of entries in heal pending: 86
Number of entries in split-brain: 0
Number of entries possibly healing: 1
Brick hive:/mnt/hive_block4/androidpolice_data3
Status: Connected
Total Number of entries: 87
Number of entries in heal pending: 86
Number of entries in split-brain: 0
Number of entries possibly healing: 1
Brick citadel:/mnt/citadel_block4/androidpolice_data3
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0
Statistics showed a diminishing number of failed heals:
...
Ending time of crawl: Tue Apr 17 21:13:08 2018
Type of crawl: INDEX
No. of entries healed: 2
No. of entries in split-brain: 0
No. of heal failed entries: 102
Starting time of crawl: Tue Apr 17 21:13:09 2018
Ending time of crawl: Tue Apr 17 21:14:30 2018
Type of crawl: INDEX
No. of entries healed: 4
No. of entries in split-brain: 0
No. of heal failed entries: 91
Starting time of crawl: Tue Apr 17 21:14:31 2018
Ending time of crawl: Tue Apr 17 21:15:34 2018
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 88
...
Eventually, everything heals and goes back to at least
where the roof isn't on fire anymore.
The server stats and volume options were given in one of
the previous replies to this thread.
Any ideas or things I could run and show the output of
to help diagnose? I'm also very open to working with
someone on the team on a live debugging session if
there's interest.
Thank you.
Sincerely,
Artem
--
Founder, Android Police <http://www.androidpolice.com>,
APK Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
<http://twitter.com/ArtemR>
On Tue, Apr 10, 2018 at 9:56 AM, Artem Russakovskii
<[email protected] <mailto:[email protected]>> wrote:
Hi Vlad,
I actually saw that post already and even asked a
question 4 days ago
(https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode#comment1172497_540917
<https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode#comment1172497_540917>).
The accepted answer also seems to go against your
suggestion to enable direct-io-mode as it says it
should be disabled for better performance when used
just for file accesses.
It'd be great if someone from the Gluster team
chimed in about this thread.
Sincerely,
Artem
--
Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net <http://beerpla.net/> |
+ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii> |
@ArtemR <http://twitter.com/ArtemR>
On Tue, Apr 10, 2018 at 7:01 AM, Vlad Kopylov
<[email protected] <mailto:[email protected]>> wrote:
Wish I knew or was able to get detailed
description of those options myself.
here is direct-io-mode
https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode
<https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode>
Same as you I ran tests on a large volume of
files, finding that main delays are in attribute
calls, ending up with those mount options to add
performance.
I discovered those options through basically
googling this user list with people sharing
their tests.
Not sure I would share your optimism, and rather
then going up I downgraded to 3.12 and have no
dir view issue now. Though I had to recreate the
cluster and had to re-add bricks with existing data.
On Tue, Apr 10, 2018 at 1:47 AM, Artem
Russakovskii <[email protected]
<mailto:[email protected]>> wrote:
Hi Vlad,
I'm using only localhost: mounts.
Can you please explain what effect each
option has on performance issues shown in my
posts?
"negative-timeout=10,attribute-timeout=30,fopen-keep-cache,direct-io-mode=enable,fetch-attempts=5"
From what I remember, direct-io-mode=enable
didn't make a difference in my tests, but I
suppose I can try again. The explanations
about direct-io-mode are quite confusing on
the web in various guides, saying enabling
it could make performance worse in some
situations and better in others due to OS
file cache.
There are also these gluster volume
settings, adding to the confusion:
Option: performance.strict-o-direct
Default Value: off
Description: This option when set to off,
ignores the O_DIRECT flag.
Option: performance.nfs.strict-o-direct
Default Value: off
Description: This option when set to off,
ignores the O_DIRECT flag.
Re: 4.0. I moved to 4.0 after finding out
that it fixes the disappearing dirs bug
related to cluster.readdir-optimize if you
remember
(http://lists.gluster.org/pipermail/gluster-users/2018-April/033830.html
<http://lists.gluster.org/pipermail/gluster-users/2018-April/033830.html>).
I was already on 3.13 by then, and 4.0
resolved the issue. It's been stable for me
so far, thankfully.
Sincerely,
Artem
--
Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net <http://beerpla.net/> |
+ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii>
| @ArtemR <http://twitter.com/ArtemR>
On Mon, Apr 9, 2018 at 10:38 PM, Vlad
Kopylov <[email protected]
<mailto:[email protected]>> wrote:
you definitely need mount options to
/etc/fstab
use ones from here
http://lists.gluster.org/pipermail/gluster-users/2018-April/033811.html
<http://lists.gluster.org/pipermail/gluster-users/2018-April/033811.html>
I went on with using local mounts to
achieve performance as well
Also, 3.12 or 3.10 branches would be
preferable for production
On Fri, Apr 6, 2018 at 4:12 AM, Artem
Russakovskii <[email protected]
<mailto:[email protected]>> wrote:
Hi again,
I'd like to expand on the
performance issues and plead for
help. Here's one case which shows
these odd hiccups:
https://i.imgur.com/CXBPjTK.gifv
<https://i.imgur.com/CXBPjTK.gifv>.
In this GIF where I switch back and
forth between copy operations on 2
servers, I'm copying a 10GB dir full
of .apk and image files.
On server "hive" I'm copying
straight from the main disk to an
attached volume block (xfs). As you
can see, the transfers are
relatively speedy and don't hiccup.
On server "citadel" I'm copying the
same set of data to a 4-replicate
gluster which uses block storage as
a brick. As you can see, performance
is much worse, and there are
frequent pauses for many seconds
where nothing seems to be happening
- just freezes.
All 4 servers have the same specs,
and all of them have performance
issues with gluster and no such
issues when raw xfs block storage is
used.
hive has long finished copying the
data, while citadel is barely
chugging along and is expected to
take probably half an hour to an
hour. I have over 1TB of data to
migrate, at which point if we went
live, I'm not even sure gluster
would be able to keep up instead of
bringing the machines and services down.
Here's the cluster config, though it
didn't seem to make any difference
performance-wise before I applied
the customizations vs after.
Volume Name: apkmirror_data1
Type: Replicate
Volume ID:
11ecee7e-d4f8-497a-9994-ceb144d6841e
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1:
nexus2:/mnt/nexus2_block1/apkmirror_data1
Brick2:
forge:/mnt/forge_block1/apkmirror_data1
Brick3:
hive:/mnt/hive_block1/apkmirror_data1
Brick4:
citadel:/mnt/citadel_block1/apkmirror_data1
Options Reconfigured:
cluster.quorum-count: 1
cluster.quorum-type: fixed
network.ping-timeout: 5
network.remote-dio: enable
performance.rda-cache-limit: 256MB
performance.readdir-ahead: on
performance.parallel-readdir: on
network.inode-lru-limit: 500000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.io-thread-count: 32
server.event-threads: 4
client.event-threads: 4
performance.read-ahead: off
cluster.lookup-optimize: on
performance.cache-size: 1GB
cluster.self-heal-daemon: enable
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on
The mounts are done as follows in
/etc/fstab:
/dev/disk/by-id/scsi-0Linode_Volume_citadel_block1
/mnt/citadel_block1 xfs defaults 0 2
localhost:/apkmirror_data1
/mnt/apkmirror_data1 glusterfs
defaults,_netdev 0 0
I'm really not sure if
direct-io-mode mount tweaks would do
anything here, what the value should
be set to, and what it is by default.
The OS is OpenSUSE 42.3, 64-bit.
80GB of RAM, 20 CPUs, hosted by Linode.
I'd really appreciate any help in
the matter.
Thank you.
Sincerely,
Artem
--
Founder, Android Police
<http://www.androidpolice.com>, APK
Mirror <http://www.apkmirror.com/>,
Illogical Robot LLC
beerpla.net <http://beerpla.net/> |
+ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii>
| @ArtemR <http://twitter.com/ArtemR>
On Thu, Apr 5, 2018 at 11:13 PM,
Artem Russakovskii
<[email protected]
<mailto:[email protected]>> wrote:
Hi,
I'm trying to squeeze
performance out of gluster on 4
80GB RAM 20-CPU machines where
Gluster runs on attached block
storage (Linode) in (4 replicate
bricks), and so far everything I
tried results in sub-optimal
performance.
There are many files - mostly
images, several million - and
many operations take minutes,
copying multiple files (even if
they're small) suddenly freezes
up for seconds at a time, then
continues, iostat frequently
shows large r_await and w_awaits
with 100% utilization for the
attached block device, etc.
But anyway, there are many
guides out there for small-file
performance improvements, but
more explanation is needed, and
I think more tweaks should be
possible.
My question today is
about performance.cache-size. Is
this a size of cache in RAM? If
so, how do I view the current
cache size to see if it gets
full and I should increase its
size? Is it advisable to bump it
up if I have many tens of gigs
of RAM free?
More generally, in the last 2
months since I first started
working with gluster and set a
production system live, I've
been feeling frustrated because
Gluster has a lot of
poorly-documented and confusing
options. I really wish
documentation could be improved
with examples and better
explanations.
Specifically, it'd be absolutely
amazing if the docs offered a
strategy for setting each value
and ways of determining more
optimal values. For example,
for performance.cache-size, if
it said something like "run
command abc to see your current
cache size, and if it's hurting,
up it, but be aware that it's
limited by RAM," it'd be already
a huge improvement to the docs.
And so on with other options.
The gluster team is quite
helpful on this mailing list,
but in a reactive rather than
proactive way. Perhaps it's
tunnel vision once you've worked
on a project for so long where
less technical explanations and
even proper documentation of
options takes a back seat, but I
encourage you to be more
proactive about helping us
understand and optimize Gluster.
Thank you.
Sincerely,
Artem
--
Founder, Android Police
<http://www.androidpolice.com>,
APK Mirror
<http://www.apkmirror.com/>,
Illogical Robot LLC
beerpla.net
<http://beerpla.net/> |
+ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii>
| @ArtemR
<http://twitter.com/ArtemR>
_______________________________________________
Gluster-users mailing list
[email protected]
<mailto:[email protected]>
http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
[email protected] <mailto:[email protected]>
http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>