Re: [Gluster-users] very bad performance on small files

Pan, Henry Sat, 15 Jan 2011 07:31:17 -0800

Hello Gluster Gurus,

I'm trying to find out what performance data you could get while trying 
eDiscovery searching application in a namespace with over 3 billins small files 
on GlusterFS...

Thanks & Good w/e

Henry PAN
Sr. Data Storage Eng/Adm
Iron Mountain
650-962-6184 (o)
650-930-6544 (c)
[email protected]

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of 
[email protected]
Sent: Saturday, January 15, 2011 1:20 AM
To: [email protected]
Subject: Gluster-very bad performance on small files

Send Gluster-users mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Gluster-users digest..."

Today's Topics:

   1. Re: very bad performance on small files (Marcus Bointon)
   2. Re: very bad performance on small files (Joe Landman)
   3. Re: very bad performance on small files (Max Ivanov)
   4. Re: very bad performance on small files (Joe Landman)
   5. Re: very bad performance on small files (Marcus Bointon)
   6. Re: very bad performance on small files (Joe Landman)
   7. Re: very bad performance on small files (Max Ivanov)
   8. Re: very bad performance on small files (Rudi Ahlers)

----------------------------------------------------------------------

Message: 1
Date: Fri, 14 Jan 2011 22:50:37 +0100
From: Marcus Bointon <[email protected]>
Subject: Re: [Gluster-users] very bad performance on small files
To: Gluster General Discussion List <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset=us-ascii

On 14 Jan 2011, at 18:58, Jacob Shucart wrote:

> This kind of thing is fine on local disks, but when you're talking about a
> distributed filesystem the network latency starts to add up since 1
> request to the web server results in a bunch of file requests.

I think the main objection is that it takes a huge amount of network latency to 
explain a > 1,500% overhead with only 2 machines.

On 14 Jan 2011, at 15:20, Joe Landman wrote:

> MB size or larger

So does gluster become faster abruptly when file sizes cross some threshold? Or 
are average speeds are proportional to file size? Would be good to see a wider 
spread of values on benchmarks of throughput vs file size for the same overall 
volume (like Max's data but with more intermediate values)

Marcus

------------------------------

Message: 2
Date: Fri, 14 Jan 2011 17:12:01 -0500
From: Joe Landman <[email protected]>
Subject: Re: [Gluster-users] very bad performance on small files
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 01/14/2011 04:50 PM, Marcus Bointon wrote:
> On 14 Jan 2011, at 18:58, Jacob Shucart wrote:
>
>> This kind of thing is fine on local disks, but when you're talking
>> about a distributed filesystem the network latency starts to add up
>> since 1 request to the web server results in a bunch of file
>> requests.
>
> I think the main objection is that it takes a huge amount of network
> latency to explain a>  1,500% overhead with only 2 machines.

If most of your file access times are dominated by latency (e.g. small,
seeky like loads), and you are going over a gigabit connection, yeah,
your performance is going to crater on any cluster file system.

Local latency to traverse the storage stack is on the order of 10's of
microseconds.  Physical latency of the disk medium is on the order of
10's of microseconds for RAMdisk, 100's of microseconds for flash/ssd,
and 1000's of microseconds (e.g. milliseconds) for spinning rust.

Now take 1 million small file writes.  Say 1024 bytes.  These million
writes have to traverse the storage stack in the kernel to get to disk.

Now add in a network latency event on the order of 1000's of
microseconds for the remote storage stack and network stack to respond.

I haven't measured it yet in a methodical manner, but I wouldn't be
surprised to see IOP rates within a factor of 2 of the bare metal for a
sufficiently fast network such as Infiniband, and within a factor of 4
or 5 for a slow network like Gigabit.

Our own experience has been generally that you are IOP constrained
because of the stack you have to traverse.  If you add more latency into
this stack, you have more to traverse, and therefore, you have more you
need to wait.  Which will have a magnification effect upon times for
small IO ops which are seeky (stat, small writes, random ops).

>
> On 14 Jan 2011, at 15:20, Joe Landman wrote:
>
>> MB size or larger
>
> So does gluster become faster abruptly when file sizes cross some
> threshold? Or are average speeds are proportional to file size? Would

Its a continuous curve, and very much user load specific.  The fewer
seeky operations you can do the better (true of all cluster file systems).

> be good to see a wider spread of values on benchmarks of throughput
> vs file size for the same overall volume (like Max's data but with
> more intermediate values)

I haven't seen Max's data, so I can't comment on this.  Understand that
performance is going to be bound by many things.  One of many things is
the speed of the spinning disk if thats what you use.  Another will be
network.

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: [email protected]
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

------------------------------

Message: 3
Date: Fri, 14 Jan 2011 22:19:58 +0000
From: Max Ivanov <[email protected]>
Subject: Re: [Gluster-users] very bad performance on small files
To: [email protected]
Message-ID:
        <[email protected]>
Content-Type: text/plain; charset=UTF-8

> I haven't seen Max's data, so I can't comment on this.  Understand that
> performance is going to be bound by many things.  One of many things is the
> speed of the spinning disk if thats what you use.  Another will be network.
>

It is very similair to kernel source tree - tons of small (2-20kb)
files. 1.1G in total.

------------------------------

Message: 4
Date: Fri, 14 Jan 2011 17:20:58 -0500
From: Joe Landman <[email protected]>
Subject: Re: [Gluster-users] very bad performance on small files
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 01/14/2011 05:19 PM, Max Ivanov wrote:
>> I haven't seen Max's data, so I can't comment on this.  Understand that
>> performance is going to be bound by many things.  One of many things is the
>> speed of the spinning disk if thats what you use.  Another will be network.
>>
>
> It is very similair to kernel source tree - tons of small (2-20kb)
> files. 1.1G in total.

Ok, worth looking into

> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: [email protected]
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

------------------------------

Message: 5
Date: Sat, 15 Jan 2011 00:26:53 +0100
From: Marcus Bointon <[email protected]>
Subject: Re: [Gluster-users] very bad performance on small files
To: Gluster General Discussion List <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset=us-ascii

On 14 Jan 2011, at 23:12, Joe Landman wrote:

> If most of your file access times are dominated by latency (e.g. small, seeky 
> like loads), and you are going over a gigabit connection, yeah, your 
> performance is going to crater on any cluster file system.
>
> Local latency to traverse the storage stack is on the order of 10's of 
> microseconds.  Physical latency of the disk medium is on the order of 10's of 
> microseconds for RAMdisk, 100's of microseconds for flash/ssd, and 1000's of 
> microseconds (e.g. milliseconds) for spinning rust.
>
> Now take 1 million small file writes.  Say 1024 bytes.  These million writes 
> have to traverse the storage stack in the kernel to get to disk.
>
> Now add in a network latency event on the order of 1000's of microseconds for 
> the remote storage stack and network stack to respond.
>
> I haven't measured it yet in a methodical manner, but I wouldn't be surprised 
> to see IOP rates within a factor of 2 of the bare metal for a sufficiently 
> fast network such as Infiniband, and within a factor of 4 or 5 for a slow 
> network like Gigabit.
>
> Our own experience has been generally that you are IOP constrained because of 
> the stack you have to traverse.  If you add more latency into this stack, you 
> have more to traverse, and therefore, you have more you need to wait.  Which 
> will have a magnification effect upon times for small IO ops which are seeky 
> (stat, small writes, random ops).

Sure, and all that applies equally to both NFS and gluster, yet in Max's 
example NFS was ~50x faster than gluster for an identical small-file workload. 
So what's gluster doing over and above what NFS is doing that's taking so long, 
given that network and disk factors are equal? I'd buy a factor of 2 for 
replication, but not 50.

In case you missed what I'm on about, it was these stats that Max posted:

> Here is the results per command:
> dd if=/dev/zero of=M/tmp bs=1M count=16384 69.2 MB/se (Native) 69.2
> MB/sec(FUSE) 52 MB/sec (NFS)
> dd if=/dev/zero of=M/tmp bs=1K count=163840000  88.1 MB/sec  (Native)
> 1.1MB/sec (FUSE) 52.4 MB/sec (NFS)
> time tar cf - M | pv > /dev/null 15.8 MB/sec (native) 3.48MB/sec
> (FUSE) 254 Kb/sec (NFS)

In my case I'm running 30kiops SSDs over gigabit. At the moment my problem 
(running 3.0.6) isn't performance but reliability - files are occasionally 
reported as 'vanished' by front-end apps (like rsync) even though they are 
present on both backing stores; no errors in gluster logs, self-heal doesn't 
help.

Marcus

------------------------------

Message: 6
Date: Fri, 14 Jan 2011 18:51:39 -0500
From: Joe Landman <[email protected]>
Subject: Re: [Gluster-users] very bad performance on small files
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 01/14/2011 06:26 PM, Marcus Bointon wrote:

>> Our own experience has been generally that you are IOP constrained
>> because of the stack you have to traverse.  If you add more latency
>> into this stack, you have more to traverse, and therefore, you have
>> more you need to wait.  Which will have a magnification effect upon
>> times for small IO ops which are seeky (stat, small writes, random
>> ops).
>
> Sure, and all that applies equally to both NFS and gluster, yet in
> Max's example NFS was ~50x faster than gluster for an identical
> small-file workload. So what's gluster doing over and above what NFS
> is doing that's taking so long, given that network and disk factors
> are equal? I'd buy a factor of 2 for replication, but not 50.

If the NFS was doing attribute caching and the GlusterFS implementation
had stat prefetch and other caching turned off, this could explain it.

> In case you missed what I'm on about, it was these stats that Max
> posted:
>
>> Here is the results per command: dd if=/dev/zero of=M/tmp bs=1M
>> count=16384 69.2 MB/se (Native) 69.2 MB/sec(FUSE) 52 MB/sec (NFS)
>> dd if=/dev/zero of=M/tmp bs=1K count=163840000  88.1 MB/sec
>> (Native) 1.1MB/sec (FUSE) 52.4 MB/sec (NFS) time tar cf - M | pv>
>> /dev/null 15.8 MB/sec (native) 3.48MB/sec (FUSE) 254 Kb/sec (NFS)

Ok, I am not sure if I saw the numbers before.  Thanks.

>
> In my case I'm running 30kiops SSDs over gigabit. At the moment my
> problem (running 3.0.6) isn't performance but reliability - files are
> occasionally reported as 'vanished' by front-end apps (like rsync)
> even though they are present on both backing stores; no errors in
> gluster logs, self-heal doesn't help.

Check your stat-prefetch settings, and your time base.  We've had some
strange issues that seem to be correlated with time bases drifting.
Including files disappearing.  We have a few open tickets on this.

The way we've worked around this problem is to abandon the NFS client
and use the glusterfs client.  Not our preferred option, but it provides
a workaround for the moment.  The NFS translator does appear to have a
few issues.  I am hoping we get more tuning knobs for it soon so we can
see if we can work around this.

Regards,

Joe

>
> Marcus _______________________________________________ Gluster-users
> mailing list [email protected]
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: [email protected]
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

------------------------------

Message: 7
Date: Sat, 15 Jan 2011 00:30:15 +0000
From: Max Ivanov <[email protected]>
Subject: Re: [Gluster-users] very bad performance on small files
To: Marcus Bointon <[email protected]>
Cc: Gluster General Discussion List <[email protected]>
Message-ID:
        <[email protected]>
Content-Type: text/plain; charset=UTF-8

> Sure, and all that applies equally to both NFS and gluster, yet in Max's 
> example NFS was ~50x faster than gluster for an identical small-file 
> workload. So what's gluster doing over and above what NFS is doing that's 
> taking so long, given that network and disk factors are equal? I'd buy a 
> factor of 2 for replication, but not 50.
>

Sorry If I didnt make it clear but both NFS in my tests is not well
known classic NFS but glusterfs in NFS mode.

------------------------------

Message: 8
Date: Sat, 15 Jan 2011 11:18:22 +0200
From: Rudi Ahlers <[email protected]>
Subject: Re: [Gluster-users] very bad performance on small files
To: Jacob Shucart <[email protected]>
Cc: [email protected]
Message-ID:

<sig.3996530d0f.AANLkTinY=zubjghto470ygtwhd_vzbb6fpj4-we+m...@mail.gmail.com>

Content-Type: text/plain; charset=ISO-8859-1

On Fri, Jan 14, 2011 at 7:58 PM, Jacob Shucart <[email protected]> wrote:
> For web hosting it is best to put user generated content(images, etc) on
> Gluster but to leave application files like PHP files on the local disk.
> This is because a single application file request could result in 20 other
> file requests since applications like PHP use includes/inherits, etc.
> This kind of thing is fine on local disks, but when you're talking about a
> distributed filesystem the network latency starts to add up since 1
> request to the web server results in a bunch of file requests.
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Max Ivanov
> Sent: Friday, January 14, 2011 6:09 AM
> To: Burnash, James
> Cc: [email protected]
> Subject: Re: [Gluster-users] very bad performance on small files
>
>> Gluster - and in fact most (all?) parallel filesystems are optimized for
> very large files. That being the case, small files are not retrieved as
> efficiently, and result in a larger number of file operations in total
> because there are a fixed number for each file accessed.
>
>
> Which makes glusterfs perfomance unacceptable for web hosting purposes =(
> _______________________________________________

So what can one use for webhosting purposes?

We use XEN / KVM virtual machines, hosted on NAS devices but the NAS
devices doesn't have an easy upgrade path. We literally have to rsync
all the data to the new device and then shutdown all the machines on
the old one and restart them on the new one. They don't provide  100%
uptime either. So I'm looking for something with easier upgrade
(GlusterFS can do this) and better uptime (again, GlusterFS can do
this).

But it's clear that GlusterFS isn't made for small files, so what else
could work well for us?
--
Kind Regards
Rudi Ahlers
SoftDux

Website: http://www.SoftDux.com
Technical Blog: http://Blog.SoftDux.com
Office: 087 805 9573
Cell: 082 554 7532

------------------------------

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

End of Gluster-users Digest, Vol 33, Issue 23
*********************************************

The information contained in this email message and its attachments is intended 
only for the private and confidential use of the recipient(s) named above, 
unless the sender expressly agrees otherwise. Transmission of email over the 
Internet is not a secure communications medium. If you are requesting or have 
requested the transmittal of personal data, as defined in applicable privacy 
laws by means of email or in an attachment to email, you must select a more 
secure alternate means of transmittal that supports your obligations to protect 
such personal data. If the reader of this message is not the intended recipient 
and/or you have received this email in error, you must take no action based on 
the information in this email and you are hereby notified that any 
dissemination, misuse or copying or disclosure of this communication is 
strictly prohibited. If you have received this communication in error, please 
notify us immediately by email and delete the original message. 

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] very bad performance on small files

Reply via email to