Thanks for the help.

That seems to have fixed it.

We were seeing hangs clocking up at a rate of a few hundred a day and for the last week there have been none.



On 03/31/2017 05:54 AM, Mohit Agrawal wrote:
Hi,

As you have mentioned client/server version in thread it shows package version are different on both(client,server).
We would recommend you to upgrade both servers and clients to rhs-3.10.1.
If it is not possible to upgrade both(client,server) then in this case it is required to upgrade client only.

Thanks
Mohit Agrawal

On Fri, Mar 31, 2017 at 2:27 PM, Mohit Agrawal <moagr...@redhat.com <mailto:moagr...@redhat.com>> wrote:

    Hi, As per attached glusterdump/stackdump it seems it is a known
    issue (https://bugzilla.redhat.com/show_bug.cgi?id=1372211
    <https://bugzilla.redhat.com/show_bug.cgi?id=1372211>) and issue
    is already fixed from the patch
    (https://review.gluster.org/#/c/15380/
    <https://review.gluster.org/#/c/15380/>). The issue is happened in
    this case Assume a file is opened with fd1 and fd2. 1. some WRITE
    opto fd1 got error, they were add back to 'todo' queue because of
    those error. 2. fd2 closed, a FLUSH op is send to write-behind. 3.
    FLUSH can not be unwind because it's not a legal waiter for those
    failed write(as func __wb_request_waiting_on() say). and those
    failed WRITE also can not be ended if fd1 is not closed. fd2 stuck
    in close syscall. As per statedump it also shows flush op fd is
    not same as write op fd. Kindly upgrade the package on 3.10.1 and
    share the result. Thanks Mohit Agrawal

    On Fri, Mar 31, 2017 at 12:29 PM, Amar Tumballi <atumball at redhat.com
    <http://lists.gluster.org/mailman/listinfo/gluster-users>> wrote:

    >/Hi Alvin, />//>/Thanks for the dump output. It helped a bit. />//>/For 
now, recommend turning off open-behind and read-ahead
    performance />/translators for you to get rid of this situation, As I 
noticed
    hung FLUSH />/operations from these translators. />//
    Looks like I gave wrong advise by looking at below snippet:

    [global.callpool.stack.61]
    >/stack=0x7f6c6f628f04 />/uid=48 />/gid=48 />/pid=11077 />/unique=10048797 />/lk-owner=a73ae5bdb5fcd0d2 />/op=FLUSH />/type=1 />/cnt=5 />//>/[global.callpool.stack.61.frame.1] />/frame=0x7f6c6f793d88 />/ref_count=0 
/>/translator=edocs-production-write-behind />/complete=0 />/parent=edocs-production-read-ahead />/wind_from=ra_flush />/wind_to=FIRST_CHILD (this)->fops->flush />/unwind_to=ra_flush_cbk />//>/[global.callpool.stack.61.frame.2] 
/>/frame=0x7f6c6f796c90 />/ref_count=1 />/translator=edocs-production-read-ahead />/complete=0 />/parent=edocs-production-open-behind />/wind_from=default_flush_resume />/wind_to=FIRST_CHILD(this)->fops->flush />/unwind_to=default_flush_cbk 
/>//>/[global.callpool.stack.61.frame.3] />/frame=0x7f6c6f79b724 />/ref_count=1 />/translator=edocs-production-open-behind />/complete=0 />/parent=edocs-production />/wind_from=io_stats_flush />/wind_to=FIRST_CHILD(this)->fops->flush 
/>/unwind_to=io_stats_flush_cbk />//>/[global.callpool.stack.61.frame.4] />/frame=0x7f6c6f79b474 />/ref_count=1 />/translator=edocs-production />/complete=0 />/parent=fuse />/wind_from=fuse_flush_resume 
/>/wind_to=FIRST_CHILD(this)->fops->flush />/unwind_to=fuse_err_cbk />//>/[global.callpool.stack.61.frame.5] />/frame=0x7f6c6f796684 />/ref_count=1 />/translator=fuse />/complete=0 />//
    Mos probably, issue is with write-behind's flush. So please turn off
    write-behind and test. If you don't have any hung httpd processes, please
    let us know.

    -Amar


    >/-Amar />//>/On Wed, Mar 29, 2017 at 6:56 AM, Alvin Starr <alvin at 
netvel.net
    <http://lists.gluster.org/mailman/listinfo/gluster-users>> wrote: />//>>/We 
are running gluster 3.8.9-1 on Centos 7.3.1611 for the servers
    and on />>/the clients 3.7.11-2 on Centos 6.8 />>//>>/We are seeing httpd 
processes hang in fuse_request_send or
    sync_page. />>//>>/These calls are from PHP 5.3.3-48 scripts />>//>>/I am 
attaching a tgz file that contains the process dump from
    glusterfsd />>/and the hung pids along with the offending pid's stacks from 
/>>//proc/{pid}/stack. />>//>>/This has been a low level annoyance for a while but it 
has become
    a much />>/bigger issue because the number of hung processes went from a few
    a week to />>/a few hundred a day. />>//>>//>>/-- />>/Alvin Starr || voice: 
(905)513-7688 />>/Netvel Inc. || Cell: (416)806-0133 />>/alvin at netvel.net
    <http://lists.gluster.org/mailman/listinfo/gluster-users> || 
/>>//>>//>>/_______________________________________________ />>/Gluster-users mailing list 
/>>/Gluster-users at gluster.org
    <http://lists.gluster.org/mailman/listinfo/gluster-users> 
/>>/http://lists.gluster.org/mailman/listinfo/gluster-users
    <http://lists.gluster.org/mailman/listinfo/gluster-users> />>//>//>//>//>/-- 
/>/Amar Tumballi (amarts) />// --

--
Alvin Starr                   ||   voice: (905)513-7688
Netvel Inc.                   ||   Cell:  (416)806-0133
al...@netvel.net              ||
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to