On 02/28/2013 05:58 AM, Torbjørn Thorsen wrote: > On Wed, Feb 27, 2013 at 9:46 PM, Brian Foster <bfos...@redhat.com> wrote: >> On 02/27/2013 10:14 AM, Torbjørn Thorsen wrote: >>> I'm seeing less-than-stellar performance on my Gluster deployment when >>> hosting VM images on the FUSE mount. ... > > I'm not familiar with the profiling feature, but I think I'm seeing > the same thing, > requests being fractured in smaller ones. >
gluster profiling is pretty straight forward. Just run the commands as described and you can dump some stats on the workload the volume is seeing: http://www.gluster.org/community/documentation/index.php/Gluster_3.2:_Running_Gluster_Volume_Profile_Command The 'info' command will print the stats since the last info invocation, so you can easily compare results between different workloads provided the volume is otherwise idle. > However, by chance I found something which seems to impact the > performance even more. > I wanted to retry the dd-to-loop-device-with-sync today, the same one > I pasted yesterday. > However, today it was quite different. > > torbjorn@xen01:/srv/ganeti/shared-file-storage/tmp$ sudo dd > if=/dev/zero of=/dev/loop1 bs=1024k count=2000 oflag=sync > 303038464 bytes (303 MB) copied, 123.95 s, 2.4 MB/s > ^C > I started testing on a slightly more up to date VM. I'm seeing fairly consistent 10MB/s with sync I/O. This is with a loop device over a a file on a locally mounted gluster volume. > So I unmounted the loop device and mounted it again, and re-ran the test. > > torbjorn@xen01:/srv/ganeti/shared-file-storage/tmp$ sudo losetup -d /dev/loop1 > torbjorn@xen01:/srv/ganeti/shared-file-storage/tmp$ sudo losetup -f > loopback.img > torbjorn@xen01:/srv/ganeti/shared-file-storage/tmp$ sudo dd > if=/dev/zero of=/dev/loop1 bs=1024k count=2000 oflag=sync > 2097152000 bytes (2.1 GB) copied, 55.9117 s, 37.5 MB/s > I can reproduce something like this when dealing with non-sync I/O. Smaller overall writes (relative to available cache) run much faster and larger write tend to normalize to a lower value. Using xfs_io instead of dd shows that writes are in fact hitting cache (e.g., smaller writes complete at 1.5GB/s, larger writes normalize to 35MB/s when we've dirtied enough memory and flushing/reclaim kicks in). It also appears that a close() on the loop device results in aggressively flushing whatever data hasn't been flushed (something fuse also does on open()). My non-sync results in dd tend to jump around, so perhaps that is a reason why. > The situation inside the Xen instance was similar, although with > different numbers. > > After being on, but mostly idle, for ~5 days.: > torbjorn@hennec:~$ sudo dd if=/dev/zero of=bigfile bs=1024k count=2000 > oflag=direct > 28311552 bytes (28 MB) copied, 35.1314 s, 806 kB/s > ^C > > After reboot and a fresh loop device: > torbjorn@hennec:~$ sudo dd if=/dev/zero of=bigfile bs=1024k count=2000 > oflag=direct > 814743552 bytes (815 MB) copied, 34.7441 s, 23.4 MB/s > ^C > > These numbers might indicate that loop device performance degrades over time. > However, I haven't seen this on local filesystems, so is this possibly > only with files on Gluster or FUSE ? I would expect this kind of behavior when caching is involved, as described above, but I'm not quite sure what would cause it with sync I/O. > > I'm on Debian stable, so things aren't exactly box fresh. > torbjorn@xen01:~$ dpkg -l | grep "^ii linux-image-$(uname -r)" > ii linux-image-2.6.32-5-xen-amd64 2.6.32-46 > Linux 2.6.32 for 64-bit PCs, Xen dom0 support > > I'm not sure how to debug the Gluster -> FUSE -> loop device interaction, > but I might try a newer kernel on the client. > >From skimming through the code and watching a writepage tracepoint, I think the high-level situation is as follows: - write()'s to the loop (block) device hit page cache as buffers. This data is subject to similar caching/writeback behavior as a local filesystem (e.g., write returns when the data is cached; if the write is sync, wait on a flush before returning). - Flushing eventually kicks in, which is page based and results in a bunch of writepage requests. The block/buffer handling code converts these writepage requests into 4k I/O (bio) requests. - These 4k I/O requests hit loop. In the file backed case, it issues write() requests to the underlying file. - In a local filesystem cache, I believe this would result in further caching in the local filesystem mapping. In the case of fuse, requests to userspace are submitted immediately, thus gluster is now receiving 4k write requests rather than 128k requests when writing to the file directly via dd with a 1MB buffer size. Given that you can reproduce the sync write variance without Xen, I would rule that out for the time being and suggest the following: - Give the profile tool a try to compare the local loop case when throughput is higher vs. lower. It would be interesting to see if anything jumps out that could help explain what is happening differently between the runs. - See what caching/performance translators are enabled in your gluster client graph (the volname-fuse.vol volfile) and see about disabling some of those one at a time, e.g.: gluster volume set myvol io-cache disable (repeat for write-behind, read-ahead, quick-read, etc.) ... and see if you get any more consistent results (good or bad). - Out of curiosity (and if you're running a recent enough gluster), try the fopen-keep-cache mount option on your gluster mount and see if it changes any behavior, particularly with a cleanly mapped loop dev. Brian _______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users