Had a look at the patch. What you are trying to do is, to re-use the port and if not successfult, you are getting a new port. I have some comments in the patch, but to me this looks mostly fine.

On 07/25/2016 07:14 PM, Atin Mukherjee wrote:


On Mon, Jul 25, 2016 at 7:12 PM, Atin Mukherjee <amukh...@redhat.com <mailto:amukh...@redhat.com>> wrote:



    On Mon, Jul 25, 2016 at 5:37 PM, Atin Mukherjee
    <amukh...@redhat.com <mailto:amukh...@redhat.com>> wrote:



        On Mon, Jul 25, 2016 at 4:34 PM, Avra Sengupta
        <aseng...@redhat.com <mailto:aseng...@redhat.com>> wrote:

            The crux of the problem is that as of today, brick
            processes on restart try to reuse the old port they were
            using (assuming that no other process will be using it,
            and not consulting pmap_registry_alloc() before using it).
            With a recent change, pmap_registry_alloc (), reassigns
            older ports that were used, but are now free. Hence snapd
            now gets a port that was previously used by a brick and
            tries to bind to it, whereas the older brick process
            without consulting pmap table blindly tries to connect to
            it, and hence we see this problem.

            Now coming to the fix, I feel brick process should not try
            to get the older port and should just take a new port
            every time it comes up. We will not run out of ports with
            this change coz, now pmap allocates old ports again, and
            the previous port being used by the brick process will
            eventually be reused. If anyone sees any concern with this
            approach, please feel free to raise so now.


        Looks to be OK, but I'll think through it and get back to you
        by a day or two if I have any objections.


    If we are conservative about bricks not binding to a different
    port on a restart, I've an alternative approach here [1] . Neither
    it has a full fledged commit message nor a BZ. I've just put this
    up for your input?


Read it as "binding" instead "not binding"


    [1] http://review.gluster.org/15005



            While awaiting feedback from you guys, I have sent this
            patch (http://review.gluster.org/15001), which moves the
            said test case to bad tests for now, and after we
            collectively reach to a conclusion on the fix, we will
            remove this from bad test.

            Regards,
            Avra


            On 07/25/2016 02:33 PM, Avra Sengupta wrote:
            The failure suggests that the port snapd is trying to
            bind to is already in use. But snapd has been modified to
            use a new port everytime. I am looking into this.

            On 07/25/2016 02:23 PM, Nithya Balachandran wrote:
            More failures:
            
https://build.gluster.org/job/rackspace-regression-2GB-triggered/22452/console

            I see these messages in the snapd.log:

            [2016-07-22 05:31:52.482282] I
            [rpcsvc.c:2199:rpcsvc_set_outstanding_rpc_limit]
            0-rpc-service: Configured rpc.outstanding-rpc-limit with
            value 64
            [2016-07-22 05:31:52.482352] W [MSGID: 101002]
            [options.c:954:xl_opt_validate] 0-patchy-server: option
            'listen-port' is deprecated, preferred is
            'transport.socket.listen-port', continuing with correction
            [2016-07-22 05:31:52.482436] E
            [socket.c:771:__socket_server_bind] 0-tcp.patchy-server:
            binding to  failed: Address already in use
            [2016-07-22 05:31:52.482447] E
            [socket.c:774:__socket_server_bind] 0-tcp.patchy-server:
            Port is already in use
            [2016-07-22 05:31:52.482459] W
            [rpcsvc.c:1630:rpcsvc_create_listener] 0-rpc-service:
            listening on transport failed
            [2016-07-22 05:31:52.482469] W [MSGID: 115045]
            [server.c:1061:init] 0-patchy-server: creation of
            listener failed
            [2016-07-22 05:31:52.482481] E [MSGID: 101019]
            [xlator.c:433:xlator_init] 0-patchy-server:
            Initialization of volume 'patchy-server' failed, review
            your volfile again
            [2016-07-22 05:31:52.482491] E [MSGID: 101066]
            [graph.c:324:glusterfs_graph_init] 0-patchy-server:
            initializing translator failed
            [2016-07-22 05:31:52.482499] E [MSGID: 101176]
            [graph.c:670:glusterfs_graph_activate] 0-graph: init failed

            On Mon, Jul 25, 2016 at 12:00 PM, Ashish Pandey
            <aspan...@redhat.com <mailto:aspan...@redhat.com>> wrote:

                Hi,

                Following test has failed 3 times in last two days -

                ./tests/bugs/snapshot/bug-1316437.t
                
https://build.gluster.org/job/rackspace-regression-2GB-triggered/22445/consoleFull
                
https://build.gluster.org/job/rackspace-regression-2GB-triggered/22445/consoleFull
                
https://build.gluster.org/job/rackspace-regression-2GB-triggered/22470/consoleFull

                Please take a look at it and check if it spurious
                failure or not.

                Ashish

                _______________________________________________
                Gluster-devel mailing list
                Gluster-devel@gluster.org
                <mailto:Gluster-devel@gluster.org>
                http://www.gluster.org/mailman/listinfo/gluster-devel







--
        --Atin




--
    --Atin




--

--Atin

_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Reply via email to