Are there any other things I can try here or have I dug up a bug that
is being looked at still? We'd love to roll out glusterfs into
production use, but the performance with small files is unacceptable.
Ben
On Jun 5, 2009, at 9:02 AM, Benjamin Krein wrote:
On Jun 5, 2009, at 2:23 AM, Pavan Vilas Sondur wrote:
Hi Benjamin,
Can you provide us with the following information to narrow down
the issue.
1. The network configuration of the GlusterFS deployment and the
bandwidth that is available
for GlusterFS to use
All these servers (servers/clients) are on a full-duplex gigabit LAN
with gigabit NICs. As I've noted in previous emails, I can easily
utilize the bandwidth with scp & rsync. As you'll see below,
glusterfs seems to do very well with large files as well.
2. Does performance also get affected the same way with large files
too? Or is it with just small files
as you have mentioned? Let us know the performance of GlusterFS
with large files.
Here are a bunch of tests I did with various configurations &
copying large files:
* Single server - cfs1 - large files (~480MB each)
r...@dev1|~|# time sh -c "for i in 1 2 3 4 5; do cp -v
webform_cache.tar /mnt/large_file_\$i.tar; done;"
`webform_cache.tar' -> `/mnt/large_file_1.tar'
`webform_cache.tar' -> `/mnt/large_file_2.tar'
`webform_cache.tar' -> `/mnt/large_file_3.tar'
`webform_cache.tar' -> `/mnt/large_file_4.tar'
`webform_cache.tar' -> `/mnt/large_file_5.tar'
real 0m23.726s
user 0m0.128s
sys 0m4.972s
# Interface RX Rate RX # TX
Rate TX #
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
───────────────────────
cfs1 (source: local)
0 eth1 91.39MiB 63911
1.98MiB 27326
* Two servers w/ AFR only - large files (~480MB each)
r...@dev1|~|# time sh -c "for i in 1 2 3 4 5; do cp -v
webform_cache.tar /mnt/large_file_\$i.tar; done;"
`webform_cache.tar' -> `/mnt/large_file_1.tar'
`webform_cache.tar' -> `/mnt/large_file_2.tar'
`webform_cache.tar' -> `/mnt/large_file_3.tar'
`webform_cache.tar' -> `/mnt/large_file_4.tar'
`webform_cache.tar' -> `/mnt/large_file_5.tar'
real 0m43.354s
user 0m0.100s
sys 0m3.044s
# Interface RX Rate RX # TX
Rate TX #
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
───────────────────────
cfs1 (source: local)
0 eth1 57.73MiB 39931
910.95KiB 10733
* Two servers w/DHT+AFR - large files (~480MB each)
r...@dev1|~|# time sh -c "for i in 1 2 3 4 5; do cp -v
webform_cache.tar /mnt/large_file_\$i.tar; done;"
`webform_cache.tar' -> `/mnt/large_file_1.tar'
`webform_cache.tar' -> `/mnt/large_file_2.tar'
`webform_cache.tar' -> `/mnt/large_file_3.tar'
`webform_cache.tar' -> `/mnt/large_file_4.tar'
`webform_cache.tar' -> `/mnt/large_file_5.tar'
real 0m43.294s
user 0m0.100s
sys 0m3.356s
# Interface RX Rate RX # TX
Rate TX #
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
───────────────────────
cfs1 (source: local)
0 eth1 58.15MiB 40224
1.52MiB 20174
# Interface RX Rate RX # TX
Rate TX #
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
───────────────────────
cfs2 (source: local)
0 eth1 55.99MiB 38684
755.51KiB 9521
* Two servers DHT *only* - large files (~480MB each) - NOTE: only
cfs1 was ever populated, isn't DHT supposed to distribute the files?:
r...@dev1|~|# time sh -c "for i in 1 2 3 4 5; do cp -v
webform_cache.tar /mnt/\$i/large_file.tar; done;"
`webform_cache.tar' -> `/mnt/1/large_file.tar'
`webform_cache.tar' -> `/mnt/2/large_file.tar'
`webform_cache.tar' -> `/mnt/3/large_file.tar'
`webform_cache.tar' -> `/mnt/4/large_file.tar'
`webform_cache.tar' -> `/mnt/5/large_file.tar'
real 0m40.062s
user 0m0.204s
sys 0m3.500s
# Interface RX Rate RX # TX
Rate TX #
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
───────────────────────
cfs1 (source: local)
0 eth1 112.96MiB 78190
1.66MiB 21994
# Interface RX Rate RX # TX
Rate TX #
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
───────────────────────
cfs2 (source: local)
0 eth1 1019.00B 7
83.00B 0
r...@cfs1|/home/clusterfs/webform/cache2|# find -ls
294926 8 drwxrwxr-x 7 www-data www-data 4096 Jun 5 08:52 .
294927 8 drwxr-xr-x 2 root root 4096 Jun 5 08:53 ./1
294933 489716 -rw-r--r-- 1 root root 500971520 Jun 5
08:54 ./1/large_file.tar
294931 8 drwxr-xr-x 2 root root 4096 Jun 5 08:53 ./5
294932 489716 -rw-r--r-- 1 root root 500971520 Jun 5
08:54 ./5/large_file.tar
294930 8 drwxr-xr-x 2 root root 4096 Jun 5 08:54 ./4
294936 489716 -rw-r--r-- 1 root root 500971520 Jun 5
08:54 ./4/large_file.tar
294928 8 drwxr-xr-x 2 root root 4096 Jun 5 08:54 ./2
294934 489716 -rw-r--r-- 1 root root 500971520 Jun 5
08:54 ./2/large_file.tar
294929 8 drwxr-xr-x 2 root root 4096 Jun 5 08:54 ./3
294935 489716 -rw-r--r-- 1 root root 500971520 Jun 5
08:54 ./3/large_file.tar
r...@cfs2|/home/clusterfs/webform/cache2|# find -ls
3547150 8 drwxrwxr-x 7 www-data www-data 4096 Jun 5 08:52 .
3547153 8 drwxr-xr-x 2 root root 4096 Jun 5
08:52 ./3
3547155 8 drwxr-xr-x 2 root root 4096 Jun 5
08:52 ./5
3547154 8 drwxr-xr-x 2 root root 4096 Jun 5
08:52 ./4
3547152 8 drwxr-xr-x 2 root root 4096 Jun 5
08:52 ./2
3547151 8 drwxr-xr-x 2 root root 4096 Jun 5
08:52 ./1
There haven't really been an issue with using large number of small
files as such in the past. Nevertheless,
we'll look into this once you give us the above details.
I don't feel that the tests I'm performing are all that abnormal or
out of the ordinary, so I'm surprised that I'm the only one having
the problems. As you can see from the large file results above, the
problem is clearly related to small files only.
Thanks for your continued interest in resolving this!
Ben
On 04/06/09 16:21 -0400, Benjamin Krein wrote:
Here are some more details with different configs:
* Only AFR between cfs1 & cfs2:
r...@dev1# time cp -rp * /mnt/
real 16m45.995s
user 0m1.104s
sys 0m5.528s
* Single server - cfs1:
r...@dev1# time cp -rp * /mnt/
real 10m33.967s
user 0m0.764s
sys 0m5.516s
* Stats via bmon on cfs1 during above copy:
# Interface RX Rate RX # TX Rate
TX #
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
─
──────────────────────
─
─
──────────────────────
cfs1 (source: local)
0 eth1 951.25KiB 1892 254.00KiB
1633
It gets progressively better, but that's still a *long* way from
<2 min
times with scp & <1 min times with rsync! And, I have no
redundancy or
distributed hash whatsoever.
* Client config for the last test:
-----
# Webform Flat-File Cache Volume client configuration
volume srv1
type protocol/client
option transport-type tcp
option remote-host cfs1
option remote-subvolume webform_cache_brick
end-volume
volume writebehind
type performance/write-behind
option cache-size 4mb
option flush-behind on
subvolumes srv1
end-volume
volume cache
type performance/io-cache
option cache-size 512mb
subvolumes writebehind
end-volume
-----
Ben
On Jun 3, 2009, at 4:33 PM, Vahriç Muhtaryan wrote:
For better understanding issue did you try 4 servers DHT only or 2
servers
DHT only or two servers replication only for find out real problem
maybe
replication or dht could have a bug ?
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Benjamin
Krein
Sent: Wednesday, June 03, 2009 11:00 PM
To: Jasper van Wanrooy - Chatventure
Cc: [email protected]
Subject: Re: [Gluster-users] Horrible performance with small files
(DHT/AFR)
The current boxes I'm using for testing are as follows:
* 2x dual-core Opteron ~2GHz (x86_64)
* 4GB RAM
* 4x 7200 RPM 73GB SATA - RAID1+0 w/3ware hardware controllers
The server storage directories live in /home/clusterfs where /
home is
an ext3 partition mounted with noatime.
These servers are not virtualized. They are running Ubuntu 8.04
LTS
Server x86_64.
The files I'm copying are all <2k javascript files (plain text)
stored
in 100 hash directories in each of 3 parent directories:
/home/clusterfs/
+ parentdir1/
| + 00/
| | ...
| + 99/
+ parentdir1/
| + 00/
| | ...
| + 99/
+ parentdir1/
+ 00/
| ...
+ 99/
There are ~10k of these <2k javascript files distributed throughout
the above directory structure totaling approximately 570MB. My
tests
have been copying that entire directory structure from a client
machine into the glusterfs mountpoint on the client.
Observing IO on both the client box & all the server boxes via
iostat
shows that the disks are doing *very* little work. Observing the
CPU/
memory load with top or htop shows that none of the boxes are CPU
or
memory bound. Observing the bandwidth in/out of the network
interface
shows <1MB/s throughput (we have a fully gigabit LAN!) which
usually
drops down to <150KB/s during the copy.
scp'ing the same directory structure from the same client to one of
the same servers will work at ~40-50MB/s sustained as a comparison.
Here is the results of copying the same directory structure using
rsync to the same partition:
# time rsync -ap * b...@cfs1:~/cache/
b...@cfs1's password:
real 0m23.566s
user 0m8.433s
sys 0m4.580s
Ben
On Jun 3, 2009, at 3:16 PM, Jasper van Wanrooy - Chatventure wrote:
Hi Benjamin,
That's not good news. What kind of hardware do you use? Is it
virtualised? Or do you use real boxes?
What kind of files are you copying in your test? What
performance do
you have when copying it to a local dir?
Best regards Jasper
----- Original Message -----
From: "Benjamin Krein" <[email protected]>
To: "Jasper van Wanrooy - Chatventure"
<[email protected]>
Cc: "Vijay Bellur" <[email protected]>, [email protected]
Sent: Wednesday, 3 June, 2009 19:23:51 GMT +01:00 Amsterdam /
Berlin / Bern / Rome / Stockholm / Vienna
Subject: Re: [Gluster-users] Horrible performance with small files
(DHT/AFR)
I reduced my config to only 2 servers (had to donate 2 of the 4 to
another project). I now have a single server using DHT (for
future
scaling) and AFR to a mirrored server. Copy times are much
better,
but still pretty horrible:
# time cp -rp * /mnt/
real 21m11.505s
user 0m1.000s
sys 0m6.416s
Ben
On Jun 3, 2009, at 3:13 AM, Jasper van Wanrooy - Chatventure
wrote:
Hi Benjamin,
Did you also try with a lower thread-count. Actually I'm using 3
threads.
Best Regards Jasper
On 2 jun 2009, at 18:25, Benjamin Krein wrote:
I do not see any difference with autoscaling removed. Current
server config:
# webform flat-file cache
volume webform_cache
type storage/posix
option directory /home/clusterfs/webform/cache
end-volume
volume webform_cache_locks
type features/locks
subvolumes webform_cache
end-volume
volume webform_cache_brick
type performance/io-threads
option thread-count 32
subvolumes webform_cache_locks
end-volume
<<snip>>
# GlusterFS Server
volume server
type protocol/server
option transport-type tcp
subvolumes dns_public_brick dns_private_brick
webform_usage_brick
webform_cache_brick wordpress_uploads_brick subs_exports_brick
option auth.addr.dns_public_brick.allow 10.1.1.*
option auth.addr.dns_private_brick.allow 10.1.1.*
option auth.addr.webform_usage_brick.allow 10.1.1.*
option auth.addr.webform_cache_brick.allow 10.1.1.*
option auth.addr.wordpress_uploads_brick.allow 10.1.1.*
option auth.addr.subs_exports_brick.allow 10.1.1.*
end-volume
# time cp -rp * /mnt/
real 70m13.672s
user 0m1.168s
sys 0m8.377s
NOTE: the above test was also done during peak hours when the
LAN/
dev server were in use which would cause some of the extra time.
This is still WAY too much, though.
Ben
On Jun 1, 2009, at 1:40 PM, Vijay Bellur wrote:
Hi Benjamin,
Could you please try by turning autoscaling off?
Thanks,
Vijay
Benjamin Krein wrote:
I'm seeing extremely poor performance writing small files to a
glusterfs DHT/AFR mount point. Here are the stats I'm seeing:
* Number of files:
r...@dev1|/home/aweber/cache|# find |wc -l
102440
* Average file size (bytes):
r...@dev1|/home/aweber/cache|# ls -lR | awk '{sum += $5; n++;}
END {print sum/n;}'
4776.47
* Using scp:
r...@dev1|/home/aweber/cache|# time scp -rp * b...@cfs1:~/
cache/
real 1m38.726s
user 0m12.173s
sys 0m12.141s
* Using cp to glusterfs mount point:
r...@dev1|/home/aweber/cache|# time cp -rp * /mnt
real 30m59.101s
user 0m1.296s
sys 0m5.820s
Here is my configuration (currently, single client writing
to 4
servers (2 DHT servers doing AFR):
SERVER:
# webform flat-file cache
volume webform_cache
type storage/posix
option directory /home/clusterfs/webform/cache
end-volume
volume webform_cache_locks
type features/locks
subvolumes webform_cache
end-volume
volume webform_cache_brick
type performance/io-threads
option thread-count 32
option max-threads 128
option autoscaling on
subvolumes webform_cache_locks
end-volume
<<snip>>
# GlusterFS Server
volume server
type protocol/server
option transport-type tcp
subvolumes dns_public_brick dns_private_brick
webform_usage_brick
webform_cache_brick wordpress_uploads_brick subs_exports_brick
option auth.addr.dns_public_brick.allow 10.1.1.*
option auth.addr.dns_private_brick.allow 10.1.1.*
option auth.addr.webform_usage_brick.allow 10.1.1.*
option auth.addr.webform_cache_brick.allow 10.1.1.*
option auth.addr.wordpress_uploads_brick.allow 10.1.1.*
option auth.addr.subs_exports_brick.allow 10.1.1.*
end-volume
CLIENT:
# Webform Flat-File Cache Volume client configuration
volume srv1
type protocol/client
option transport-type tcp
option remote-host cfs1
option remote-subvolume webform_cache_brick
end-volume
volume srv2
type protocol/client
option transport-type tcp
option remote-host cfs2
option remote-subvolume webform_cache_brick
end-volume
volume srv3
type protocol/client
option transport-type tcp
option remote-host cfs3
option remote-subvolume webform_cache_brick
end-volume
volume srv4
type protocol/client
option transport-type tcp
option remote-host cfs4
option remote-subvolume webform_cache_brick
end-volume
volume afr1
type cluster/afr
subvolumes srv1 srv3
end-volume
volume afr2
type cluster/afr
subvolumes srv2 srv4
end-volume
volume dist
type cluster/distribute
subvolumes afr1 afr2
end-volume
volume writebehind
type performance/write-behind
option cache-size 4mb
option flush-behind on
subvolumes dist
end-volume
volume cache
type performance/io-cache
option cache-size 512mb
subvolumes writebehind
end-volume
Benjamin Krein
www.superk.org
_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users