On Tue, Jul 26, 2016 at 12:34 AM, Niels de Vos <[email protected]> wrote:
> On Mon, Jul 25, 2016 at 04:34:17PM +0530, Avra Sengupta wrote: > > The crux of the problem is that as of today, brick processes on restart > try > > to reuse the old port they were using (assuming that no other process > will > > be using it, and not consulting pmap_registry_alloc() before using it). > With > > a recent change, pmap_registry_alloc (), reassigns older ports that were > > used, but are now free. Hence snapd now gets a port that was previously > used > > by a brick and tries to bind to it, whereas the older brick process > without > > consulting pmap table blindly tries to connect to it, and hence we see > this > > problem. > > > > Now coming to the fix, I feel brick process should not try to get the > older > > port and should just take a new port every time it comes up. We will not > run > > out of ports with this change coz, now pmap allocates old ports again, > and > > the previous port being used by the brick process will eventually be > reused. > > If anyone sees any concern with this approach, please feel free to raise > so > > now. > > I wonder how this is handled with reconnecting clients. If a client > thinks it was connected to a brick, but the connection was lost, does it > try to connect to the same port again? I dont know if it really connects > to the pmap service in GlusterD to find the new/updated port... > client does query for the portmap every time using client_query_portmap () in reconnect logic. So in case there is a change in the port it goes for rpc_clnt_reconfig which will ensure client talk to the brick process(es) on the new port. > Niels > > > > > > While awaiting feedback from you guys, I have sent this patch > > (http://review.gluster.org/15001), which moves the said test case to bad > > tests for now, and after we collectively reach to a conclusion on the > fix, > > we will remove this from bad test. > > > > Regards, > > Avra > > > > On 07/25/2016 02:33 PM, Avra Sengupta wrote: > > > The failure suggests that the port snapd is trying to bind to is > already > > > in use. But snapd has been modified to use a new port everytime. I am > > > looking into this. > > > > > > On 07/25/2016 02:23 PM, Nithya Balachandran wrote: > > > > More failures: > > > > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/22452/console > > > > > > > > I see these messages in the snapd.log: > > > > > > > > [2016-07-22 05:31:52.482282] I > > > > [rpcsvc.c:2199:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: > > > > Configured rpc.outstanding-rpc-limit with value 64 > > > > [2016-07-22 05:31:52.482352] W [MSGID: 101002] > > > > [options.c:954:xl_opt_validate] 0-patchy-server: option > > > > 'listen-port' is deprecated, preferred is > > > > 'transport.socket.listen-port', continuing with correction > > > > [2016-07-22 05:31:52.482436] E [socket.c:771:__socket_server_bind] > > > > 0-tcp.patchy-server: binding to failed: Address already in use > > > > [2016-07-22 05:31:52.482447] E [socket.c:774:__socket_server_bind] > > > > 0-tcp.patchy-server: Port is already in use > > > > [2016-07-22 05:31:52.482459] W > > > > [rpcsvc.c:1630:rpcsvc_create_listener] 0-rpc-service: listening on > > > > transport failed > > > > [2016-07-22 05:31:52.482469] W [MSGID: 115045] [server.c:1061:init] > > > > 0-patchy-server: creation of listener failed > > > > [2016-07-22 05:31:52.482481] E [MSGID: 101019] > > > > [xlator.c:433:xlator_init] 0-patchy-server: Initialization of volume > > > > 'patchy-server' failed, review your volfile again > > > > [2016-07-22 05:31:52.482491] E [MSGID: 101066] > > > > [graph.c:324:glusterfs_graph_init] 0-patchy-server: initializing > > > > translator failed > > > > [2016-07-22 05:31:52.482499] E [MSGID: 101176] > > > > [graph.c:670:glusterfs_graph_activate] 0-graph: init failed > > > > > > > > On Mon, Jul 25, 2016 at 12:00 PM, Ashish Pandey <[email protected] > > > > <mailto:[email protected]>> wrote: > > > > > > > > Hi, > > > > > > > > Following test has failed 3 times in last two days - > > > > > > > > ./tests/bugs/snapshot/bug-1316437.t > > > > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/22445/consoleFull > > > > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/22445/consoleFull > > > > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/22470/consoleFull > > > > > > > > Please take a look at it and check if it spurious failure or not. > > > > > > > > Ashish > > > > > > > > _______________________________________________ > > > > Gluster-devel mailing list > > > > [email protected] <mailto:[email protected]> > > > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > > > > > > > > > > > -- --Atin
_______________________________________________ Gluster-devel mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-devel
