On Mon, Jul 25, 2016 at 7:12 PM, Atin Mukherjee <[email protected]> wrote:
> > > On Mon, Jul 25, 2016 at 5:37 PM, Atin Mukherjee <[email protected]> > wrote: > >> >> >> On Mon, Jul 25, 2016 at 4:34 PM, Avra Sengupta <[email protected]> >> wrote: >> >>> The crux of the problem is that as of today, brick processes on restart >>> try to reuse the old port they were using (assuming that no other process >>> will be using it, and not consulting pmap_registry_alloc() before using >>> it). With a recent change, pmap_registry_alloc (), reassigns older ports >>> that were used, but are now free. Hence snapd now gets a port that was >>> previously used by a brick and tries to bind to it, whereas the older brick >>> process without consulting pmap table blindly tries to connect to it, and >>> hence we see this problem. >>> >>> Now coming to the fix, I feel brick process should not try to get the >>> older port and should just take a new port every time it comes up. We will >>> not run out of ports with this change coz, now pmap allocates old ports >>> again, and the previous port being used by the brick process will >>> eventually be reused. If anyone sees any concern with this approach, please >>> feel free to raise so now. >>> >> >> Looks to be OK, but I'll think through it and get back to you by a day or >> two if I have any objections. >> > > If we are conservative about bricks not binding to a different port on a > restart, I've an alternative approach here [1] . Neither it has a full > fledged commit message nor a BZ. I've just put this up for your input? > Read it as "binding" instead "not binding" > > [1] http://review.gluster.org/15005 > > >> >> >>> While awaiting feedback from you guys, I have sent this patch ( >>> http://review.gluster.org/15001), which moves the said test case to bad >>> tests for now, and after we collectively reach to a conclusion on the fix, >>> we will remove this from bad test. >>> >>> Regards, >>> Avra >>> >>> >>> On 07/25/2016 02:33 PM, Avra Sengupta wrote: >>> >>> The failure suggests that the port snapd is trying to bind to is already >>> in use. But snapd has been modified to use a new port everytime. I am >>> looking into this. >>> >>> On 07/25/2016 02:23 PM, Nithya Balachandran wrote: >>> >>> More failures: >>> >>> https://build.gluster.org/job/rackspace-regression-2GB-triggered/22452/console >>> >>> I see these messages in the snapd.log: >>> >>> [2016-07-22 05:31:52.482282] I >>> [rpcsvc.c:2199:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured >>> rpc.outstanding-rpc-limit with value 64 >>> [2016-07-22 05:31:52.482352] W [MSGID: 101002] >>> [options.c:954:xl_opt_validate] 0-patchy-server: option 'listen-port' is >>> deprecated, preferred is 'transport.socket.listen-port', continuing with >>> correction >>> [2016-07-22 05:31:52.482436] E [socket.c:771:__socket_server_bind] >>> 0-tcp.patchy-server: binding to failed: Address already in use >>> [2016-07-22 05:31:52.482447] E [socket.c:774:__socket_server_bind] >>> 0-tcp.patchy-server: Port is already in use >>> [2016-07-22 05:31:52.482459] W [rpcsvc.c:1630:rpcsvc_create_listener] >>> 0-rpc-service: listening on transport failed >>> [2016-07-22 05:31:52.482469] W [MSGID: 115045] [server.c:1061:init] >>> 0-patchy-server: creation of listener failed >>> [2016-07-22 05:31:52.482481] E [MSGID: 101019] >>> [xlator.c:433:xlator_init] 0-patchy-server: Initialization of volume >>> 'patchy-server' failed, review your volfile again >>> [2016-07-22 05:31:52.482491] E [MSGID: 101066] >>> [graph.c:324:glusterfs_graph_init] 0-patchy-server: initializing translator >>> failed >>> [2016-07-22 05:31:52.482499] E [MSGID: 101176] >>> [graph.c:670:glusterfs_graph_activate] 0-graph: init failed >>> >>> On Mon, Jul 25, 2016 at 12:00 PM, Ashish Pandey <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> Following test has failed 3 times in last two days - >>>> >>>> ./tests/bugs/snapshot/bug-1316437.t >>>> >>>> https://build.gluster.org/job/rackspace-regression-2GB-triggered/22445/consoleFull >>>> >>>> https://build.gluster.org/job/rackspace-regression-2GB-triggered/22445/consoleFull >>>> >>>> https://build.gluster.org/job/rackspace-regression-2GB-triggered/22470/consoleFull >>>> >>>> Please take a look at it and check if it spurious failure or not. >>>> >>>> Ashish >>>> >>>> _______________________________________________ >>>> Gluster-devel mailing list >>>> [email protected] >>>> http://www.gluster.org/mailman/listinfo/gluster-devel >>>> >>> >>> >>> >>> >> >> >> -- >> >> --Atin >> > > > > -- > > --Atin > -- --Atin
_______________________________________________ Gluster-devel mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-devel
