Sorry for the late reply. Pranith Kumar Karampuri <pkara...@redhat.com> 写于 2016/01/25 17:48:06:
> From: Pranith Kumar Karampuri <pkara...@redhat.com> > To: li.ping...@zte.com.cn, > Cc: li.y...@zte.com.cn, zhou.shigan...@zte.com.cn, > liu.jianj...@zte.com.cn, yang.bi...@zte.com.cn > Date: 2016/01/25 17:48 > Subject: Re: 答复: Re: [Gluster-devel] Gluster AFR volume write > performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND > in afr_writev > > > On 01/25/2016 03:09 PM, li.ping...@zte.com.cn wrote: > Hi Pranith, > > I'd be willing to have a chance to do my contribution to open-source. > It's my first time to deliver a patch for GlusterFS, hence I'm not > quite familiar with the code review and submitting procedures. > > I'll try to make it ASAP. By the way is there any guidelines to do this work? > http://www.gluster.org/community/documentation/index.php/ > Simplified_dev_workflow may be helpful. Feel free to ask any doubt > you may have. > > How do you guys use glusterfs? > > Pranith Thanks for your warm tips. We currently use glusterfs to build the shared storage for distributed cluster nodes. Here are the solutions I pondered over these days: 1,Reverting the AFR GLUSTERFS_WRITE_IS_APPEND modifications. because this optimization only play a part for appending write fops, but most of the time of writing it is not kind of this. Hence I think it is not worth to do an optimization for the low probability situation at cost of the vast majority of AFR writing performance drop. 2,Revising the fixed GLUSTERFS_WRITE_IS_APPEND dictionary option in afr_writev in a dynamic way. i.e. adding a new dynamic configurable option "write_is_append" just as the existing "ensure-durability" for AFR. It could be configured on if AFR writing performance is not mainly concerned and off if the performance is demanded. I have been trying to find out a way in posix_writev to predict the appending write in advance and then lock/unlock or not lock accordingly in the shortest and soonest, but I get no chance. Anybody's other good ideas are appreciated. Ping.Li > > Thanks & Best Regards. > > Pranith Kumar Karampuri <pkara...@redhat.com> 写于 2016/01/23 14:01:36: > > > From: Pranith Kumar Karampuri <pkara...@redhat.com> > > To: li.ping...@zte.com.cn, gluster-devel@gluster.org, > > Cc: li.y...@zte.com.cn, liu.jianj...@zte.com.cn, > > zhou.shigan...@zte.com.cn, yang.bi...@zte.com.cn > > Date: 2016/01/23 14:02 > > Subject: Re: 答复: Re: [Gluster-devel] Gluster AFR volume write > > performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND > > in afr_writev > > > > > > > On 01/22/2016 07:14 AM, li.ping...@zte.com.cn wrote: > > Hi Pranith, it is appreciated for your reply. > > > > Pranith Kumar Karampuri <pkara...@redhat.com> 写于 2016/01/20 18:51:19: > > > > > 发件人: Pranith Kumar Karampuri <pkara...@redhat.com> > > > 收件人: li.ping...@zte.com.cn, gluster-devel@gluster.org, > > > 日期: 2016/01/20 18:51 > > > 主题: Re: [Gluster-devel] Gluster AFR volume write performance has > > > been seriously affected by GLUSTERFS_WRITE_IS_APPEND in afr_writev > > > > > > Sorry for the delay in response. > > > > > On 01/15/2016 02:34 PM, li.ping...@zte.com.cn wrote: > > > GLUSTERFS_WRITE_IS_APPEND Setting in afr_writev function at > > > glusterfs client end makes the posix_writev in the server end deal > > > IO write fops from parallel to serial in consequence. > > > > > > i.e. multiple io-worker threads carrying out IO write fops are > > > blocked in posix_writev to execute final write fop pwrite/pwritev in > > > __posix_writev function ONE AFTER ANOTHER. > > > > > > For example: > > > > > > thread1: iot_worker -> ... -> posix_writev() | > > > thread2: iot_worker -> ... -> posix_writev() | > > > thread3: iot_worker -> ... -> posix_writev() -> __posix_writev() > > > thread4: iot_worker -> ... -> posix_writev() | > > > > > > there are 4 iot_worker thread doing the 128KB IO write fops as > > > above, but only one can execute __posix_writev function and the > > > others have to wait. > > > > > > however, if the afr volume is configured on with storage.linux-aio > > > which is off in default, the iot_worker will use posix_aio_writev > > > instead of posix_writev to write data. > > > the posix_aio_writev function won't be affected by > > > GLUSTERFS_WRITE_IS_APPEND, and the AFR volume write performance goes up. > > > I think this is a bug :-(. > > > > Yeah, I agree with you. I suppose the GLUSTERFS_WRITE_IS_APPEND is a > > misuse in afr_writev. > > I checked the original intent of GLUSTERS_WRITE_IS_APPEND change at > > review website: > > http://review.gluster.org/#/c/5501/ > > > > The initial purpose seems to avoid an unnecessary fsync() in > > afr_changelog_post_op_safe function if the writing data position > > was currently at the end of the file, detected by > > (preop.ia_size == offset || (fd->flags & O_APPEND)) in posix_writev. > > > > In comparison with the afr write performance loss, I think > > it costs too much. > > > > I suggest to make the GLUSTERS_WRITE_IS_APPEND setting configurable > > just as ensure-durability in afr. > > > > You are right, it doesn't make sense to put this option in > > dictionary if ensure-durability is off. http://review.gluster.org/13285 > > addresses this. Do you want to try this out? > > Thanks for doing most of the work :-). Do let me know if you want to > > raise a bug for this. Or I can take that up if you don't have time. > > > > Pranith > > > > > > > > So, my question is whether AFR volume could work fine with > > > storage.linux-aio configuration which bypass the > > > GLUSTERFS_WRITE_IS_APPEND setting in afr_writev, > > > and why glusterfs keeps posix_aio_writev different from posix_writev ? > > > > > > Any replies to clear my confusion would be grateful, and thanks > in advance. > > > What is the workload you have? multiple writers on same file workloads? > > > > I test the afr gluster volume by fio like this: > > fio --filename=/mnt/afr/20G.dat --direct=1 --rw=write --bs=128k -- > > size=20G --numjobs=8 > > --runtime=60 --group_reporting --name=afr_test --iodepth=1 -- > ioengine=libaio > > > > The Glusterfs BRICKS are two IBM X3550 M3. > > > > The local disk direct write performance of 128KB IO req block size > > is about 18MB/s > > in single thread and 80MB/s in 8 multi-threads. > > > > If the GLUSTERS_WRITE_IS_APPEND is configed, the afr gluster volume > > write performance is 18MB/s > > as the single thread, and if not, the performance is nearby 75MB/s. > > (network bandwith is enough) > > > > > > > > Pranith > > > > > > > > > -------------------------------------------------------- > > > ZTE Information Security Notice: The information contained in this > > > mail (and any attachment transmitted herewith) is privileged and > > > confidential and is intended for the exclusive use of the addressee > > > (s). If you are not an intended recipient, any disclosure, > > > reproduction, distribution or other dissemination or use of the > > > information contained is strictly prohibited. If you have received > > > this mail in error, please delete it and notify us immediately. > > > > > > > > > > > > > > > > _______________________________________________ > > > Gluster-devel mailing list > > > Gluster-devel@gluster.org > > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > -------------------------------------------------------- > > ZTE Information Security Notice: The information contained in this > > mail (and any attachment transmitted herewith) is privileged and > > confidential and is intended for the exclusive use of the addressee > > (s). If you are not an intended recipient, any disclosure, > > reproduction, distribution or other dissemination or use of the > > information contained is strictly prohibited. If you have received > > this mail in error, please delete it and notify us immediately. > >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel