Re: [Gluster-users] Horrible performance with small files (DHT/AFR)

Benjamin Krein Tue, 09 Jun 2009 04:21:09 -0700

Are there any other things I can try here or have I dug up a bug thatis being looked at still? We'd love to roll out glusterfs intoproduction use, but the performance with small files is unacceptable.

Ben


On Jun 5, 2009, at 9:02 AM, Benjamin Krein wrote:

On Jun 5, 2009, at 2:23 AM, Pavan Vilas Sondur wrote:
Hi Benjamin,
Can you provide us with the following information to narrow downthe issue.1. The network configuration of the GlusterFS deployment and thebandwidth that is available
 for GlusterFS to use
All these servers (servers/clients) are on a full-duplex gigabit LANwith gigabit NICs. As I've noted in previous emails, I can easilyutilize the bandwidth with scp & rsync. As you'll see below,glusterfs seems to do very well with large files as well.
2. Does performance also get affected the same way with large filestoo? Or is it with just small filesas you have mentioned? Let us know the performance of GlusterFSwith large files.
Here are a bunch of tests I did with various configurations &copying large files:
* Single server - cfs1 - large files (~480MB each)
r...@dev1|~|# time sh -c "for i in 1 2 3 4 5; do cp -vwebform_cache.tar /mnt/large_file_\$i.tar; done;"
`webform_cache.tar' -> `/mnt/large_file_1.tar'
`webform_cache.tar' -> `/mnt/large_file_2.tar'
`webform_cache.tar' -> `/mnt/large_file_3.tar'
`webform_cache.tar' -> `/mnt/large_file_4.tar'
`webform_cache.tar' -> `/mnt/large_file_5.tar'

real    0m23.726s
user    0m0.128s
sys     0m4.972s
# Interface RX Rate RX # TXRate TX #──────────────────────────────────────────────────────────────────────────────────────
cfs1 (source: local)
0 eth1 91.39MiB 639111.98MiB 27326
* Two servers w/ AFR only - large files (~480MB each)
r...@dev1|~|# time sh -c "for i in 1 2 3 4 5; do cp -vwebform_cache.tar /mnt/large_file_\$i.tar; done;"
`webform_cache.tar' -> `/mnt/large_file_1.tar'
`webform_cache.tar' -> `/mnt/large_file_2.tar'
`webform_cache.tar' -> `/mnt/large_file_3.tar'
`webform_cache.tar' -> `/mnt/large_file_4.tar'
`webform_cache.tar' -> `/mnt/large_file_5.tar'

real    0m43.354s
user    0m0.100s
sys     0m3.044s
# Interface RX Rate RX # TXRate TX #───────────────────────────────────────────────────────────────────────────────────────────
cfs1 (source: local)
0 eth1 57.73MiB 39931910.95KiB 10733
* Two servers w/DHT+AFR - large files (~480MB each)
r...@dev1|~|# time sh -c "for i in 1 2 3 4 5; do cp -vwebform_cache.tar /mnt/large_file_\$i.tar; done;"
`webform_cache.tar' -> `/mnt/large_file_1.tar'
`webform_cache.tar' -> `/mnt/large_file_2.tar'
`webform_cache.tar' -> `/mnt/large_file_3.tar'
`webform_cache.tar' -> `/mnt/large_file_4.tar'
`webform_cache.tar' -> `/mnt/large_file_5.tar'

real    0m43.294s
user    0m0.100s
sys     0m3.356s
# Interface RX Rate RX # TXRate TX #───────────────────────────────────────────────────────────────────────────────────────────
cfs1 (source: local)
0 eth1 58.15MiB 402241.52MiB 20174# Interface RX Rate RX # TXRate TX #───────────────────────────────────────────────────────────────────────────────────────────
cfs2 (source: local)
0 eth1 55.99MiB 38684755.51KiB 9521
* Two servers DHT *only* - large files (~480MB each) - NOTE: onlycfs1 was ever populated, isn't DHT supposed to distribute the files?:r...@dev1|~|# time sh -c "for i in 1 2 3 4 5; do cp -vwebform_cache.tar /mnt/\$i/large_file.tar; done;"
`webform_cache.tar' -> `/mnt/1/large_file.tar'
`webform_cache.tar' -> `/mnt/2/large_file.tar'
`webform_cache.tar' -> `/mnt/3/large_file.tar'
`webform_cache.tar' -> `/mnt/4/large_file.tar'
`webform_cache.tar' -> `/mnt/5/large_file.tar'

real    0m40.062s
user    0m0.204s
sys     0m3.500s
# Interface RX Rate RX # TXRate TX #───────────────────────────────────────────────────────────────────────────────────────────
cfs1 (source: local)
0 eth1 112.96MiB 781901.66MiB 21994# Interface RX Rate RX # TXRate TX #───────────────────────────────────────────────────────────────────────────────────────────
cfs2 (source: local)
0 eth1 1019.00B 783.00B 0
r...@cfs1|/home/clusterfs/webform/cache2|# find -ls
294926    8 drwxrwxr-x   7 www-data www-data     4096 Jun  5 08:52 .
294927    8 drwxr-xr-x   2 root     root         4096 Jun  5 08:53 ./1
294933 489716 -rw-r--r-- 1 root root 500971520 Jun 508:54 ./1/large_file.tar
294931    8 drwxr-xr-x   2 root     root         4096 Jun  5 08:53 ./5
294932 489716 -rw-r--r-- 1 root root 500971520 Jun 508:54 ./5/large_file.tar
294930    8 drwxr-xr-x   2 root     root         4096 Jun  5 08:54 ./4
294936 489716 -rw-r--r-- 1 root root 500971520 Jun 508:54 ./4/large_file.tar
294928    8 drwxr-xr-x   2 root     root         4096 Jun  5 08:54 ./2
294934 489716 -rw-r--r-- 1 root root 500971520 Jun 508:54 ./2/large_file.tar
294929    8 drwxr-xr-x   2 root     root         4096 Jun  5 08:54 ./3
294935 489716 -rw-r--r-- 1 root root 500971520 Jun 508:54 ./3/large_file.tar
r...@cfs2|/home/clusterfs/webform/cache2|# find -ls
3547150    8 drwxrwxr-x   7 www-data www-data     4096 Jun  5 08:52 .
3547153 8 drwxr-xr-x 2 root root 4096 Jun 508:52 ./33547155 8 drwxr-xr-x 2 root root 4096 Jun 508:52 ./53547154 8 drwxr-xr-x 2 root root 4096 Jun 508:52 ./43547152 8 drwxr-xr-x 2 root root 4096 Jun 508:52 ./23547151 8 drwxr-xr-x 2 root root 4096 Jun 508:52 ./1
There haven't really been an issue with using large number of smallfiles as such in the past. Nevertheless,
we'll look into this once you give us the above details.
I don't feel that the tests I'm performing are all that abnormal orout of the ordinary, so I'm surprised that I'm the only one havingthe problems. As you can see from the large file results above, theproblem is clearly related to small files only.
Thanks for your continued interest in resolving this!

Ben
On 04/06/09 16:21 -0400, Benjamin Krein wrote:
Here are some more details with different configs:

* Only AFR between cfs1 & cfs2:
r...@dev1# time cp -rp * /mnt/

real    16m45.995s
user    0m1.104s
sys     0m5.528s

* Single server - cfs1:
r...@dev1# time cp -rp * /mnt/

real    10m33.967s
user    0m0.764s
sys     0m5.516s

* Stats via bmon on cfs1 during above copy:
#   Interface                RX Rate         RX #     TX Rate
TX #
──────────────────────────────────────────────────────────────────────────────────────
cfs1 (source: local)
0   eth1                     951.25KiB       1892     254.00KiB
1633
It gets progressively better, but that's still a *long* way from<2 mintimes with scp & <1 min times with rsync! And, I have noredundancy or
distributed hash whatsoever.

* Client config for the last test:
-----
# Webform Flat-File Cache Volume client configuration

volume srv1
        type protocol/client
        option transport-type tcp
        option remote-host cfs1
        option remote-subvolume webform_cache_brick
end-volume

volume writebehind
        type performance/write-behind
        option cache-size 4mb
      option flush-behind on
        subvolumes srv1
end-volume

volume cache
        type performance/io-cache
        option cache-size 512mb
        subvolumes writebehind
end-volume
-----

Ben

On Jun 3, 2009, at 4:33 PM, Vahriç Muhtaryan wrote:
For better understanding issue did you try 4 servers DHT only or 2
servers
DHT only or two servers replication only for find out real problem
maybe
replication or dht could have a bug ?

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of BenjaminKrein
Sent: Wednesday, June 03, 2009 11:00 PM
To: Jasper van Wanrooy - Chatventure
Cc: [email protected]
Subject: Re: [Gluster-users] Horrible performance with small files
(DHT/AFR)

The current boxes I'm using for testing are as follows:

* 2x dual-core Opteron ~2GHz (x86_64)
* 4GB RAM
* 4x 7200 RPM 73GB SATA - RAID1+0 w/3ware hardware controllers
The server storage directories live in /home/clusterfs where /home is
an ext3 partition mounted with noatime.
These servers are not virtualized. They are running Ubuntu 8.04LTS
Server x86_64.
The files I'm copying are all <2k javascript files (plain text)stored
in 100 hash directories in each of 3 parent directories:

/home/clusterfs/
+ parentdir1/
|   + 00/
|   | ...
|   + 99/
+ parentdir1/
|   + 00/
|   | ...
|   + 99/
+ parentdir1/
    + 00/
    | ...
    + 99/

There are ~10k of these <2k javascript files distributed throughout
the above directory structure totaling approximately 570MB. Mytests
have been copying that entire directory structure from a client
machine into the glusterfs mountpoint on the client.
Observing IO on both the client box & all the server boxes viaiostatshows that the disks are doing *very* little work. Observing theCPU/memory load with top or htop shows that none of the boxes are CPUormemory bound. Observing the bandwidth in/out of the networkinterfaceshows <1MB/s throughput (we have a fully gigabit LAN!) whichusually
drops down to <150KB/s during the copy.

scp'ing the same directory structure from the same client to one of
the same servers will work at ~40-50MB/s sustained as a comparison.
Here is the results of copying the same directory structure using
rsync to the same partition:

# time rsync -ap * b...@cfs1:~/cache/
b...@cfs1's password:

real    0m23.566s
user    0m8.433s
sys     0m4.580s

Ben

On Jun 3, 2009, at 3:16 PM, Jasper van Wanrooy - Chatventure wrote:
Hi Benjamin,

That's not good news. What kind of hardware do you use? Is it
virtualised? Or do you use real boxes?
What kind of files are you copying in your test? Whatperformance do
you have when copying it to a local dir?

Best regards Jasper

----- Original Message -----
From: "Benjamin Krein" <[email protected]>
To: "Jasper van Wanrooy - Chatventure"<[email protected]>
Cc: "Vijay Bellur" <[email protected]>, [email protected]
Sent: Wednesday, 3 June, 2009 19:23:51 GMT +01:00 Amsterdam /
Berlin / Bern / Rome / Stockholm / Vienna
Subject: Re: [Gluster-users] Horrible performance with small files
(DHT/AFR)

I reduced my config to only 2 servers (had to donate 2 of the 4 to
another project). I now have a single server using DHT (forfuturescaling) and AFR to a mirrored server. Copy times are muchbetter,
but still pretty horrible:

# time cp -rp * /mnt/

real    21m11.505s
user    0m1.000s
sys     0m6.416s

Ben
On Jun 3, 2009, at 3:13 AM, Jasper van Wanrooy - Chatventurewrote:
Hi Benjamin,

Did you also try with a lower thread-count. Actually I'm using 3
threads.

Best Regards Jasper


On 2 jun 2009, at 18:25, Benjamin Krein wrote:
I do not see any difference with autoscaling removed.  Current
server config:

# webform flat-file cache

volume webform_cache
type storage/posix
option directory /home/clusterfs/webform/cache
end-volume

volume webform_cache_locks
type features/locks
subvolumes webform_cache
end-volume

volume webform_cache_brick
type performance/io-threads
option thread-count 32
subvolumes webform_cache_locks
end-volume

<<snip>>

# GlusterFS Server
volume server
type protocol/server
option transport-type tcp
subvolumes dns_public_brick dns_private_brickwebform_usage_brick
webform_cache_brick wordpress_uploads_brick subs_exports_brick
option auth.addr.dns_public_brick.allow 10.1.1.*
option auth.addr.dns_private_brick.allow 10.1.1.*
option auth.addr.webform_usage_brick.allow 10.1.1.*
option auth.addr.webform_cache_brick.allow 10.1.1.*
option auth.addr.wordpress_uploads_brick.allow 10.1.1.*
option auth.addr.subs_exports_brick.allow 10.1.1.*
end-volume

# time cp -rp * /mnt/

real    70m13.672s
user    0m1.168s
sys     0m8.377s
NOTE: the above test was also done during peak hours when theLAN/
dev server were in use which would cause some of the extra time.
This is still WAY too much, though.

Ben


On Jun 1, 2009, at 1:40 PM, Vijay Bellur wrote:
Hi Benjamin,

Could you please try by turning autoscaling off?

Thanks,
Vijay

Benjamin Krein wrote:
I'm seeing extremely poor performance writing small files to a
glusterfs DHT/AFR mount point. Here are the stats I'm seeing:

* Number of files:
r...@dev1|/home/aweber/cache|# find |wc -l
102440

* Average file size (bytes):
r...@dev1|/home/aweber/cache|# ls -lR | awk '{sum += $5; n++;}
END {print sum/n;}'
4776.47

* Using scp:
r...@dev1|/home/aweber/cache|# time scp -rp * b...@cfs1:~/cache/
real 1m38.726s
user 0m12.173s
sys 0m12.141s

* Using cp to glusterfs mount point:
r...@dev1|/home/aweber/cache|# time cp -rp * /mnt

real 30m59.101s
user 0m1.296s
sys 0m5.820s
Here is my configuration (currently, single client writingto 4
servers (2 DHT servers doing AFR):

SERVER:

# webform flat-file cache

volume webform_cache
type storage/posix
option directory /home/clusterfs/webform/cache
end-volume

volume webform_cache_locks
type features/locks
subvolumes webform_cache
end-volume

volume webform_cache_brick
type performance/io-threads
option thread-count 32
option max-threads 128
option autoscaling on
subvolumes webform_cache_locks
end-volume

<<snip>>

# GlusterFS Server
volume server
type protocol/server
option transport-type tcp
subvolumes dns_public_brick dns_private_brickwebform_usage_brick
webform_cache_brick wordpress_uploads_brick subs_exports_brick
option auth.addr.dns_public_brick.allow 10.1.1.*
option auth.addr.dns_private_brick.allow 10.1.1.*
option auth.addr.webform_usage_brick.allow 10.1.1.*
option auth.addr.webform_cache_brick.allow 10.1.1.*
option auth.addr.wordpress_uploads_brick.allow 10.1.1.*
option auth.addr.subs_exports_brick.allow 10.1.1.*
end-volume

CLIENT:

# Webform Flat-File Cache Volume client configuration

volume srv1
type protocol/client
option transport-type tcp
option remote-host cfs1
option remote-subvolume webform_cache_brick
end-volume

volume srv2
type protocol/client
option transport-type tcp
option remote-host cfs2
option remote-subvolume webform_cache_brick
end-volume

volume srv3
type protocol/client
option transport-type tcp
option remote-host cfs3
option remote-subvolume webform_cache_brick
end-volume

volume srv4
type protocol/client
option transport-type tcp
option remote-host cfs4
option remote-subvolume webform_cache_brick
end-volume

volume afr1
type cluster/afr
subvolumes srv1 srv3
end-volume

volume afr2
type cluster/afr
subvolumes srv2 srv4
end-volume

volume dist
type cluster/distribute
subvolumes afr1 afr2
end-volume

volume writebehind
type performance/write-behind
option cache-size 4mb
option flush-behind on
subvolumes dist
end-volume

volume cache
type performance/io-cache
option cache-size 512mb
subvolumes writebehind
end-volume

Benjamin Krein
www.superk.org




_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users



_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] Horrible performance with small files (DHT/AFR)

Reply via email to