> On 3 Feb 2017, at 13:48, Gambit15 <[email protected]> wrote: > > Hi Alex, > I don't use Gluster for storing large amounts of small files, however from > what I've read, that does appear to its big achilles heel.
I am not an expert but I agree, due to its distributed nature, the induced per file access latency plays a big role when you have to deal with lot of small files, but it seems there are some tuning options available, a good place to start could be : https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/Small_File_Performance_Enhancements.html > Personally, if you're not looking to scale out to a lot more servers, I'd go > with Ceph or DRBD. Gluster's best features are in its scalability. AFAIK Ceph need at least 3 monitors (aka a quorum) to be fully “hight available”, so the entry ticket is pretty high and from my point of view over-kill for such needs, except if you plane to scale out too. DRBD seems a more reasonable approach. Cheers > Also, it's worth pointing out that in any setup, you've got to be careful > with 2 node configurations as they're highly vulnerable to split-brain > scenarios. > > Given the relatively small size of your data, caching tweaks & an arbiter may > well save you here, however I don't use enough of its caching features to be > able to give advice on it. > > D > > On 3 February 2017 at 08:28, Alex Sudakar <[email protected] > <mailto:[email protected]>> wrote: > Hi. I'm looking for a clustered filesystem for a very simple > scenario. I've set up Gluster but my tests have shown quite a > performance penalty when compared to using a local XFS filesystem. > This no doubt reflects the reality of moving to a proper distributed > filesystem, but I'd like to quickly check that I haven't missed > something obvious that might improve performance. > > I plan to have two Amazon AWS EC2 instances (virtual machines) both > accessing the same filesystem for read/writes. Access will be almost > entirely reads, with the occasional modification, deletion or creation > of files. Ideally I wanted all those reads going straight to the > local XFS filesystem and just the writes incurring a distributed > performance penalty. :-) > > So I've set up two VMs with Centos 7.2 and Gluster 3.8.8, each machine > running as a combined Gluster server and client. One brick on each > machine, one volume in a 1 x 2 replica configuration. > > Everything works, it's just the performance penalty which is a surprise. :-) > > My test directory has 9,066 files and directories; 7,987 actual files. > Total size is 63MB data, 85MB allocated; an average size of 8KB data > per file. The brick's files have a total of 117MB allocated, with the > extra 32MB working out pretty much to be exactly the sum of the extra > 4KB extents that would have been allocated for the XFS attributes per > file - the VMs were installed with the default 256 byte inode size for > the local filesystem, and from what I've read Gluster will force the > filesystem to allocate an extent for its attributes. 'xfs_bmap' on a > few files shows this is the case. > > A simple 'cat' of every file when laid out in 'native' directories on > the XFS filesystem takes about 3 seconds. A cat of all the files in > the brick's directory on the same filesystem takes about 6.4 seconds, > which I figure is due to the extra I/O for the inode metadata extents > (although not quite certain; the additional extents added about 40% > extra to the disk block allocation, so I'm unsure as to why the time > increase was 100%). > > Doing the same test through the glusterfs mount takes about 25 > seconds; roughly four times longer than reading those same files > directly from the brick itself. > > It took 30 seconds until I applied the 'md-cache' settings (for those > variables that still exist in 3.8.8) mentioned in this very helpful > article: > > http://blog.gluster.org/category/performance/ > <http://blog.gluster.org/category/performance/> > > So use of the md-cache in a 'cold run' shaved off 5 seconds - due to > common directory LOOKUP operations being cached I guess. > > Output of a 'volume info' is as follows: > > Volume Name: g1 > Type: Replicate > Volume ID: bac6cd70-ca0d-4173-9122-644051444fe5 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: serverA:/data/brick1 > Brick2: serverC:/data/brick1 > Options Reconfigured: > transport.address-family: inet > performance.readdir-ahead: on > nfs.disable: on > cluster.self-heal-daemon: enable > features.cache-invalidation: on > features.cache-invalidation-timeout: 600 > performance.stat-prefetch: on > performance.md-cache-timeout: 60 > network.inode-lru-limit: 90000 > > The article suggests a value of 600 for > features.cache-invalidation-timeout but my Gluster version only > permits a maximum value of 60. > > Network speed between the two VMs is about 120 MBytes/sec - the two > VMs inhabit the same Amazon Virtual Private Cloud - so I don't think > bandwidth is a factor. > > The 400% slowdown is no doubt the penalty incurred in moving to a > proper distributed filesystem. That article and other web pages I've > read all say that each open of a file results in synchronous LOOKUP > operations on all the replicas, so I'm guessing it just takes that > much time for everything to happen before a file can be opened. > Gluster profiling shows that there are 11,198 LOOKUP operations on the > test cat of the 7,987 files. > > As a Gluster newbie I'd appreciate some quick advice if possible - > > 1. Is this sort of performance hit - on directories of small files - > typical for such a simple Gluster configuration? > > 2. Is there anything I can do to speed things up? :-) > > 3. Repeating the 'cat' test immediately after the first test run saw > the time dive from 25 seconds down to 4 seconds. Before I'd set those > md-cache variables it had taken 17 seconds, due, I assume, to the > actual file data being cached in the Linux buffer cache. So those > md-cache settings really did make a change - taking off another 13 > seconds - once everything was cached. > > Flushing/invalidating the Linux memory cache made the next test go > back to the 25 seconds. So it seems to me that the md-cache must hold > its contents in the Linux memory buffers cache ... which surprised me, > because I thought a user-space system like Gluster would have the > cache within the daemons or maybe a shared memory segment, nothing > that would be affected by clearing the Linux buffer cache. I was > expecting a run after invalidating the linux cache would take > something between 4 seconds and 25 seconds, with the md-cache still > primed but the file data expired. > > Just out of curiosity in how the md-cache is implemented ... why does > clearing the Linux buffers seem to affect it? > > 4. The documentation says that Geo Gluster does 'asynchronous > replication', which is something that would really help, but that it's > 'master/slave', so I'm assuming that Geo Gluster won't fulfill my > requirements of both servers being able to occasionally > write/modify/delete files? > > 5. In my brick directory I have a '.trashcan' subdirectory - which is > documented - but also a '.glusterfs' directory, which seems to have > lots of magical files in some sort of housekeeping structure. > Surprisingly the total amount of data under .glusterfs is greater than > the total size of the actual files in my test directory. I haven't > seen a description of what .glusterfs is used for ... are they vital > to the operation of Gluster, or can they be deleted? Just curious. > At once stage I had 1.1 GB of files in my volume, which expanded to be > 1.5GB in the brick (due to the metadata extents) and a whopping 1.6GB > of extra data materialized under the .glusterfs directory! > > 6. Since I'm using Centos I try to stick with things that are > available through the Red Hat repository channel ... so in my looking > for distributed filesystems I saw mention of Ceph. Because I wanted > only a simple replicated filesystem it seemed to me that Ceph - being > based/focused on 'object' storage? - wouldn't be as good a fit as > Gluster. Evil question to a Gluster mailing list - will Ceph give me > any significantly better performance in reading small files? > > I've tried to investigate and find out what I can but I could be > missing something really obvious in my ignorance, so I would > appreciate any quick tips/answers from the experts. Thanks! > _______________________________________________ > Gluster-users mailing list > [email protected] <mailto:[email protected]> > http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > > _______________________________________________ > Gluster-users mailing list > [email protected] > http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
