Hi:

I made a debugging error and the problem has nothing to do with FIONBIO.
The real problem is that epoll_ctl() fails with errno set to EBADF when called as follows.

    epoll_ctl(loop->backend_fd, EPOLL_CTL_ADD, fd, &e)

This happens in uv__check_fd() and means etiher the backend_fd or fd is bad.
I think it is far more likely the backend_fd is is bad but don't know why this would happen.
Any ideas ?

PR




On 05/08/2021 08:27 AM, pa...@rcom-software.com wrote:
Hi:

Addition to my last message. When uv__nonblock() fails it is indicative of a Linux FIONBIO ioctl() failure. What would cause
setting non-blocking mode to fail ?

Best Regards,

Paul R.

On Friday, May 7, 2021 at 9:11:00 PM UTC-7 pa...@rcom-software.com wrote:

    Hi:

    I was able to track down and fix the problem causing
    uv_poll_init() to return EEXIST.
    However, uv_poll_init() still returns EBADF intermittently and I
    traced the problem
    to the following code segment but don't fully understand why.

    The EBADF is generated by the following code segment in
    uv__poll_io() in src/unix/poll.c around line 90.

      /* If ioctl(FIONBIO) reports ENOTTY, try fcntl(F_GETFL) +
    fcntl(F_SETFL).
       * Workaround for e.g. kqueue fds not supporting ioctls.
       */
      err = uv__nonblock(fd, 1);
      if (err == UV_ENOTTY)
        if (uv__nonblock == uv__nonblock_ioctl)
          err = uv__nonblock_fcntl(fd, 1);

    Do you have any insight about this ?

    I was able to verify the the following sequence of events can
    occur and strongly
    suspect it can lead to corruption. This is what causes EEXIST to
    be returned in
    uv_poll_init() calls.


    1) The following sequence of calls executes because it has been
    decided polling needs
       be terminated for a particular TCP connection.

        uv_poll_stop(poll_handle_1);
        uv_close((uv_handle_t *) poll_handle_1, close_callback);

    2) Now it is decided polling needs to be terminated for another
    TCP connection and
       the following sequence of calls executes.

        uv_poll_stop(poll_handle_2);
        uv_close((uv_handle_t *) poll_handle_2, close_callback);

    3) The close_callback() function executes but we don't know if it
    executes for the
       uv_close() call made with poll_handle_1 or poll_handle_2 first.


    Handling the problem, requires knowing that close_callback() won't
    execute until
    uv_run() execute. (I don't think this is documented.)

    The solution is that when a uv_poll_stop()/uv_close() sequence is
    started do
    not start another such sequence until both close_callback() and
    uv_run() have
    executed.

    In my case, this requires a variable, Poll_Releasing which is set
    to TRUE when such a
    sequence has been initiated and to FALSE when the close_callback()
    executes.
    It also requires putting T_POLL_DISMANTLE_REQ messages that arrive
    while Poll_Releasing
    is TRUE on a temporary queue. Then when the IO_Task_Input_Q is
    empty, the messages from
    the temporary queue are moved to IO_Task_Input_Q and the
    poll_proxy() routine returns which
    allows uv_run() to execute.

    The relevant code segments are as follows.

    1) In the IO_Task() the poll_proxy() routine in network_io.c.


    ROUTINE PRIVATE void poll_proxy(uv_async_t *handle)
    {
        CONN_DESC *cdesc;
        MSG *msg, *rsp;
        MSG_FIFO Poll_Release_Q;
        uv_poll_t *poll_handle;
        uv_handle_t **ppdata;
        int r;
        BOOL done;

        //
        // Do nothing if execution is already in progress.
        //
        if(Poll_Proxy_Busy)
            return;

        Poll_Proxy_Busy = TRUE;
        Poll_Release_Q.head = Poll_Release_Q.tail = NULL;
        //
        // Handle messages from other tasks.
        //
        done = FALSE;
        while(done != TRUE)
          {

    ENTER_MUTEX(&Async_Mutex);
            msg = remove_msg(&IO_Task_Input_Q);
    EXIT_MUTEX(&Async_Mutex);

            if(msg)
              {
                cdesc = (CONN_DESC *) msg->conn_desc;
                syslog(LOG_DEBUG, "IO_TASK(Proxy): RX EVENT - CONN[%d]
    Type %d\n", cdesc->index, msg->type);

                switch(msg->type)
                {

                case T_POLL_CONFIG_REQ:
                    //
                    // Main process request to start Libuv/epoll()
    operation.
                    //
            ... Code to handle message ...
                    break;
                case T_POLL_DISMANTLE_REQ:
                    //
                    // If a poll handle is being dismantled, you must
                    // wait for the uv_close() callback routine to execute
                    // before handling this message. In order for the
    callback
                    // routine to execute uv_run() must be called.
                    //
                    if(Poll_Releasing)
                      {
                        //
                        // Save the message on a temporary queue
                        // and move the contents of that queue to
                        // the task input queue when this routine returns.
                        //
                        insert_msg(&Poll_Release_Q, msg);
                        msg = NULL;
                        break;
                      }
                    else
                        Poll_Releasing = TRUE;
                    //
                    // Protocol_Task() request to cease Libuv/epoll()
    operation and
                    // release the poll handle and its resources.
                    //

                    ppdata = (uv_handle_t **) &msg->buffer[0];
                    poll_handle = (uv_poll_t *) *ppdata;

            if( (r = uv_poll_stop(poll_handle)) )
                      {
                        fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY -
    Poll Stop Error %d, %s",
                             r, uv_err_name(r) );
                      }
                    //
                    // The callback routine notifies the
    Protocol_Task() when everything is done.
                    //
                    poll_handle->data = (void *) cdesc;
                    uv_close((uv_handle_t *) poll_handle,
    async_close_callback);
            break;
                default:
                    fatal_error(LOG_EMERG, "IO_TASK: POLL_PROXY -
    Invalid Message = %d", msg->type);

                }

                if(msg)
                    MSG_FREE(msg);
              }
            else
              {
                done = TRUE;
                //
                // The input queue is now empty. Move the messages saved
                // on the temporary queue to it.
                //
                while( (msg = remove_msg(&Poll_Release_Q)) )
                   {
    ENTER_MUTEX(&Async_Mutex);
                    insert_msg(&IO_Task_Input_Q, msg);
    EXIT_MUTEX(&Async_Mutex);
                   }
              }
          }

        Poll_Proxy_Busy = FALSE;

        return;
    }


    2) In the IO_Task() the XXX routine in async.c.

    ROUTINE void async_close_callback(uv_handle_t *handle)
    {
        MSG *msg;

        CONN_DESC *cdesc = (CONN_DESC *) handle->data;
        int type = handle->type;


        VFREE((UCHAR *) handle);
        //
        // Send a notification message to the Protocol_Task.
        //
        msg = MSG_ALLOC(0, FALSE);
        msg->class = C_NOTIFY;
        msg->type = T_DISMANTLE_RSP;
        msg->info = 0;

        SendProtoMsg(cdesc, msg);
        //
        // Signal that the poll handle is released.
        //
        if(type == UV_POLL)
          {
            Poll_Releasing = FALSE;
            syslog(LOG_INFO, "IO_Task: CONN[%d] Poll Release
    Complete", cdesc->index);
          }

        return;
    }





    On 05/04/2021 06:21 PM, Paul Romero wrote:
    Hi:

    I have two questions about Libuv that I haven't seen addressed by
    the documentation.

    First, when uv_poll_init() returns EEXIST, could it be indicative
    of a transient
    problem rather than an operational problem or bug ?

    Second, this question is related to concurrency but does not
    involve concurrent execution.
    In fact the all the Libuv calls execute serially.

    1) The following sequence of calls executes because it has been
    decided polling needs
       be terminated for a particular TCP connection.

        uv_poll_stop(poll_handle_1);
        uv_close((uv_handle_t *) poll_handle_1, close_callback);

    2) Now it is decided polling needs to be terminated for another
    TCP connection and
       the following sequence of calls executes.

        uv_poll_stop(poll_handle_2);
        uv_close((uv_handle_t *) poll_handle_2, close_callback);

    3) The close_callback() function executes but we don't know if it
    executes for the
       uv_close() call made with poll_handle_1 or poll_handle_2 first.

    Can this happen and result in corruption ?

    Best Regards,

    Paul R.

    On 05/02/2021 06:51 AM, Jameson Nash wrote:
    You might want to try compiling your application with Clang
    TSan. Although, since libuv is a single-threaded library (per
    uv_loop_t), any access from multiple threads is generally just
    forbidden.

    On Sun, May 2, 2021 at 2:24 AM Paul Romero
    <pa...@rcom-software.com> wrote:

        Hi:

        A trace giving a more refined picture of the problem and
        more details is attached
        to this note. It dumps what I consider to be the significant
        fields of the uv_loop_t structure,
        the Poll_Loop in my code, and the the dynamically allocated
        uv_poll_t structure surrounding
        the essential Libuv API routine calls. I think the format of
        the dump is fairly intuitive
        but please inform me if you disagree.

        My comments about what is happening are identified by the
        string "COMMENT:" and the
        API calls are identified by the string "MPOLL". The trace is
        redacted to eliminate
        irrelevant events.

        Please let me know if you see something amiss about the data
        structure fields at
        any point.

        My server code is based on simple event oriented
        architecture and uses the following 5 concurrent
        tasks.

        1) main(): Handles incoming connections and Linux signals.
        (Libuv based.)
        2) IO_Task(): Handles incoming data from the network. (Libuv
        based.)
        3) Protocol_Task(): Does most of the work. (Message based.)
        4) DB_Task(): Does database insertions. (Message based.)
        5) Timer_Task(): Handles timer management. (Mostly  Libuv
        based.)

        I can send you my server code and a test tool if it would be
        useful to you.
        However, it may take a few days to put it in an easily
        useable form.

        As previously mentioned uv_accept() executes in the main()
        task and the
        rest of the polling related Libuv API routines execute in
        the IO_Task().

        Finally, remember the problem is intermittent. Generally, it
        takes about
        45 to 3 hours of constant concurrent connections for the
        crash to occur.

        Best Regards,

        Paul R.




        On 05/01/2021 12:01 AM, Paul Romero wrote:
        Hi:

        I may have overlooked the following factor.

        We know that Libuv is not reentrant and not really
        multi-task safe. I suspect, but
        don't know, that the internal Libuv data structures
        attached to the uv_loop_t structure,
        Poll_Loop in my case, are not protected by internal Libuv
        Mutexes.

        1) uv_accept() and uv_fileno() exectute in the main() thread.

        2) uv_poll_init(), uv_poll_start() and uv_poll_stop(),
        execute in the IO_Task() thread.

        That means uv_accept() and that the Libuv API routines in
        2, above, can execute concurrently.
        Is it possible if such concurrency could damage the
        internal Libuv data structures attached to the
        uv_loop_t structure ?  (Again Poll_Loop in my case.)

        Best Regards,

        Paul R.

        NOTE: I did a few experments and noticed that calling
        uv_close() in a Mutex often
        prevents its callback routine from executing.

        On 04/30/2021 04:59 PM, Paul Romero wrote:

        Hi:

        I think I've isolated the problem and it appears to be in
        uv_poll_stop().

        In particlular if uv_accept() is called immediately after
        uv_poll_stop(), and it reallocates the
        same socket descriptor just freed by uv_poll_stop(), then
        uv_poll_init() fails with the error
        EEXIST or EBADF. This must have something to do with
        uv_loop_t handle passed to uv_poll_init(),
        called Poll_Loop in my code.

        I had to instrument my code to keep track of Libuv API
        calls to see this.

        The following explanatory items follows.

        A - The sequence of calls that produces the problem.
        B - Commented syslog() diagnostics.
        C - The relevant code segments.
        D - The gdb stack backtrace.

        Best Regards,

        Paul R.


        A) The sequence of relevant calls is as follows and the
        problem only occur with this exact
        sequence of events.
        
****************************************************************************************

        1) uv_poll_stop(): The uv_pool_t handle is bound to socket
        descriptor N.
           This should free socket descriptor N.
        2) uv_accept(): Bound to socket desciptor N as set by
        uv_fileno().
           Socket descriptor N should be newly allocated.
        3) uv_poll_init(): Returns EBADF with socket descriptor N
        and Poll_Loop.watchers[N] is NULL.
           Something is wrong !

        B) The commented syslog() diagnostic sequence from the
        instrumented code.
        
**************************************************************************
        * Stop Polling - Connection 1 Fd 21. This frees the FD.

        Apr 30 15:46:08 mobius pexd[30963]: POLL_CALL[i 1 fd 21] -
        IO_TASK - uv_poll_stop() - T_POLL_DISMANTLE_REQ

        * Stop Polling - Connection 0 Fd 20. This frees the FD

        Apr 30 15:46:08 mobius pexd[30963]: POLL_CALL[i 0 fd 20] -
        IO_TASK - uv_poll_stop() - T_POLL_DISMANTLE_REQ

        * New Connection - Connection 1 Fd 20. This is fine since
        uv_poll_stop() was called for Fd 20 which
          freed it and it is newly allcated with uv_accept() and
        obtained with uv_fileno().

        Apr 30 15:46:08 mobius pexd[30963]: POLL_CALL[i 1 fd 20] -
        IO_TASK - uv_accept() + uv_fileno()

        * However we crash because uv_poll_init() returned EBADF !

        Apr 30 15:46:08 mobius pexd[30963]: POLL_CALL[i 1 fd 20] -
        IO_TASK - uv_poll_init() - T_POLL_CONFIG_REQ
Apr 30 15:46:08 mobius pexd[30963]: IO_TASK: POLL_PROXY - Polling Initialization Error -9, EBADF

        C) The edited associated code segments are as follows.
        ********************************************************
        1) Prelude to uv_poll_stop() in poll_proxy() in the
        IO_Task() thread. (network_io.c)
        
------------------------------------------------------------------------------------
                    case T_POLL_DISMANTLE_REQ:
                        ppdata = (uv_handle_t **) &msg->buffer[0];
                        poll_handle = (uv_poll_t *) *ppdata;

        syslog(LOG_INFO, "POLL_CALL[i %d fd %d] - IO_TASK -
        uv_poll_stop() - T_POLL_DISMANTLE_REQ\n", cdesc->index,
        cdesc->fd);

                        if( (r = uv_poll_stop(poll_handle)) )
                          {
                            fatal_error(LOG_CRIT, "IO_TASK:
        POLL_PROXY - Poll Stop Error %d, %s",
                                 r, uv_err_name(r) );
                          }
                        //
                        // The callback routine notifies the
        Protocol_Task() when everything is done.
                        //
                        poll_handle->data = (void *) cdesc;
                        uv_close((uv_handle_t *) poll_handle,
        async_close_callback);


        2) Connection acceptance in make_incoming_connection() in
        the main() thread. (main.c)
        
---------------------------------------------------------------------------------------
                conn_handle = (uv_tcp_t *) VALLOC(sizeof(uv_tcp_t));
            uv_tcp_init(&Connect_Loop, conn_handle);

            if((r = uv_accept(listen_handle, (uv_stream_t *)
        conn_handle)) == 0)
               {
                //
                // A new connection occured.
                //
                    uv_os_fd_t fd;
        ENTER_MUTEX(&Protocol_D_Mutex);
                nsock = N_Sockets++;
        EXIT_MUTEX(&Protocol_D_Mutex);
                //
                // Fetch the socket descriptor from the connection
        handle.
                //
                uv_fileno((const uv_handle_t*) conn_handle, &fd);
                //
                // Allocate the connection descriptor.
                //
                cdesc = ALLOC_CONN_DESC(fd);
                if( !cdesc )
                  {
                    syslog(LOG_DEBUG, "MAIN_TASK: CONNECTION
        REJECTION - No Connection Descriptors, N =  %d", nsock);

        ENTER_MUTEX(&Protocol_D_Mutex);
                    N_Sockets--;
                    close(fd);
        EXIT_MUTEX(&Protocol_D_Mutex);
                    uv_close((uv_handle_t *) conn_handle,
        forced_close_callback);
                    return;
                  }

        syslog(LOG_INFO, "POLL_CALL[i %d fd %d] - IO_TASK -
        uv_accept() + uv_fileno()\n", cdesc->index, cdesc->fd);
                //
                // Save the connection handle and start polling.
                //
                cdesc->conn_handle = (uv_tcp_t *) conn_handle;

                syslog(LOG_INFO, "MAIN_TASK: NEW CONNECTION
        ESTABLISHED - CONN %d FD %d", cdesc->index, cdesc->fd);
                //
                // Set up epoll() plumbing by sending a message to
        IO_Task();
                //
                bzero((void *) &poll_data, sizeof(ASYNC_DATA));
                poll_data.type = T_POLL_CONFIG_REQ;
                poll_data.async_handle = &IO_Task_Async_Handle;
                poll_data.async_q = &IO_Task_Input_Q;
                poll_data.object_handle = NULL;
                //
                // The T_POLL_CONFIG_RSP message will be sent to
        the Protocol_Task() which
                // is in S_IDLE state.
                //
                send_async_msg(cdesc, &poll_data);
              }

        3) Prelude to uv_poll_init() in poll_proxy() in the
        IO_Task() thread. (network_io.c)
        
------------------------------------------------------------------------------------
                    case T_POLL_CONFIG_REQ:
                        //
                        // Main process request to start
        Libuv/epoll() operation.
                        //
                        poll_handle = (uv_poll_t *)
        VALLOC(sizeof(uv_poll_t));

        syslog(LOG_INFO, "POLL_CALL[i %d fd %d] - IO_TASK -
        uv_poll_init() - T_POLL_CONFIG_REQ\n", cdesc->index,
        cdesc->fd);
                        if( (r = uv_poll_init(&Poll_Loop,
        poll_handle, cdesc->fd)) != 0)
                          {

                            fatal_error(LOG_CRIT, "IO_TASK:
        POLL_PROXY -  Polling Initialization Error %d, %s",
                                 r, uv_err_name(r));
                          }



        D) Stack backtrace.

        (gdb) bt
        #0  0x00007f62c47b0267 in __GI_raise (sig=sig@entry=6) at
        ../sysdeps/unix/sysv/linux/raise.c:55
        #1  0x00007f62c47b1eca in __GI_abort () at abort.c:89
        #2  0x000000000040d411 in fatal_error (log_prio=2,
            fmt=0x424e28 "IO_TASK: POLL_PROXY - Polling
        Initialization Error %d, %s") at misc.c:135
        #3  0x0000000000405b6d in poll_proxy (handle=0x641f80
        <IO_Task_Async_Handle>) at network_io.c:291
        #4  0x00000000004114cb in uv__async_io (loop=0xc26800
        <Poll_Loop>, w=<optimized out>, events=<optimized out>)
            at src/unix/async.c:163
        #5  0x000000000041d255 in uv__io_poll
        (loop=loop@entry=0xc26800 <Poll_Loop>, timeout=<optimized
        out>)
            at src/unix/linux-core.c:462
        #6  0x0000000000411c48 in uv_run (loop=0xc26800
        <Poll_Loop>, mode=UV_RUN_DEFAULT) at src/unix/core.c:385
        #7  0x0000000000405835 in IO_Task (arg=0x0) at
        network_io.c:100
        #8  0x00007f62c4b4c6aa in start_thread
        (arg=0x7f62bbfff700) at pthread_create.c:333
        #9  0x00007f62c4881eed in clone () at
        ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

        At this point cdesc->fd is 20 and Poll_Loop.watchers[20]
        is NULL.
        On 04/30/2021 11:48 AM, Paul Romero wrote:
        Hi:

        My guess is that there is a subtle, probably timing
        related, problem in uv_poll_stop()
        simply because I can't think of any other relevant Libuv
        routines that are explicitly
        called after uv_poll_start() succeeds. What do you think
        ? Of course, there could
        be a problem in the way connections are rejected. All
        that code is in the make_incoming_connection()
        callback routine in main.c

        Best Regards,

        Paul R.


        On 04/29/2021 03:14 PM, Paul Romero wrote:
        Hi:

        I reattached the following files with some confusing
        debugging stuff removed and
        a dump using them follows.

        * main.c
        * network_io.c
        * rx_fsm.c
        * async.c

        Again this is the relevant code segment.

            case T_POLL_CONFIG_REQ:
                //
                // Main process request to start Libuv/epoll()
        operation.
                //
                poll_handle = (uv_poll_t *)
        VALLOC(sizeof(uv_poll_t));

                if( (r = uv_poll_init(&Poll_Loop, poll_handle,
        cdesc->fd)) != 0)
                  {

                    fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY
        -  Polling Initialization Error %d, %s",
                         r, uv_err_name(r));
                  }

        This is the crash diagnostic.

        IO_TASK: POLL_PROXY -  Polling Initialization Error -17,
        EEXIST
        Aborted (core dumped)

        And the is the gdb backtrace and is typical of the problem.

        Using host libthread_db library
        "/lib/x86_64-linux-gnu/libthread_db.so.1".
        Core was generated by `./pexd'.
        Program terminated with signal SIGABRT, Aborted.
        #0  0x00007ff1da90b267 in __GI_raise (sig=sig@entry=6)
        at ../sysdeps/unix/sysv/linux/raise.c:55
        55 ../sysdeps/unix/sysv/linux/raise.c: No such file or
        directory.
        (gdb) bt
        #0  0x00007ff1da90b267 in __GI_raise (sig=sig@entry=6)
        at ../sysdeps/unix/sysv/linux/raise.c:55
        #1  0x00007ff1da90ceca in __GI_abort () at abort.c:89
        #2  0x000000000040d379 in fatal_error (log_prio=2,
            fmt=0x424ce0 "IO_TASK: POLL_PROXY - Polling
        Initialization Error %d, %s") at misc.c:135
        #3  0x0000000000405b21 in poll_proxy (handle=0x641f80
        <IO_Task_Async_Handle>) at network_io.c:287
        #4  0x000000000041142b in uv__async_io (loop=0xc26800
        <Poll_Loop>, w=<optimized out>, events=<optimized out>)
            at src/unix/async.c:163
        #5  0x000000000041d1b5 in uv__io_poll
        (loop=loop@entry=0xc26800 <Poll_Loop>,
        timeout=<optimized out>)
            at src/unix/linux-core.c:462
        #6  0x0000000000411ba8 in uv_run (loop=0xc26800
        <Poll_Loop>, mode=UV_RUN_DEFAULT) at src/unix/core.c:385
        #7  0x000000000040580f in IO_Task (arg=0x0) at
        network_io.c:100
        #8  0x00007ff1daca76aa in start_thread
        (arg=0x7ff1d5a7c700) at pthread_create.c:333
        #9  0x00007ff1da9dceed in clone () at
        ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

        At this point cdesc->fd is 22 and Poll_Loop.watchers[22]
        = 0x17ef458, a valid address ?

        On 04/29/2021 09:09 AM, Paul Romero wrote:
        Hi:

        The most common symptom of the problem is that
        uv_poll_init() fails which means the problem must
        be related to the  uv_poll_t Poll_Loop. The relevant
        code segment is as follows.

           poll_handle = (uv_poll_t *) VALLOC(sizeof(uv_poll_t));

           if( (r = uv_poll_init(&Poll_Loop, poll_handle,
        cdesc->fd)) != 0)
             {
fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY - Polling Initialization Error %d, %s",
                                     r, uv_err_name(r));
              }

        The following files illustrate the problem and are
        attached to this message. Please advise me
        how to attach files to the list directly.

        *  trace_badf.txt: A stack trace when the error code is
        EBADF.
        *  trace_eexist.txt: A stack trace when the error code
        is EEXIST.
        * network_io.c: The IO_Task() thread code. The problem
        occurs in the poll_proxy() routine.
        * main.c:  The main() process/thread code where most
        initialization occurs.
        * transmit.c: The Transmit_Task() thread.
        * rx_fsm.c: The heart of the protocol task code.


        You are right that only the IO_Task() performs polling
        related operations.
        However, please note that the Transmit_Task() thread
        uses the underlying TCP socket descriptor
        to send data to the network with Linux send(). (See the
        TX_DATA() routine.) Also,
        USE_SSL and USE_SCRAM are not defined.

        I will send more information if am able to  produce the
        problem at another point.
        Also, I have seen the issue you mentioned before and
        suspect it may be relevant.
        Please let me know if you need anything else.


        Best Regards,

        Paul R.






        Best Regards,

        Paul R.


        On 04/28/2021 08:52 PM, Jameson Nash wrote:
        I'm assuming from your description that you've ensured
        all libuv calls happen from exactly one thread (except
        uv_async_send). It sounds like you're describing the
        issue that
        
https://github.com/libuv/libuv/commit/c9406ba0e3d67907c1973a71968b89a6bd83c63c
        was intended to fix, which was included in v1.41.0

        Note that the Poll_Loop.watchers[N] is not always
        required to be NULL, so perhaps you've found a case
        where it is still incorrectly expecting that? You
        might want to file an issue with code to reproduce the
        problem, or at least an `rr` trace dump of the code
        that hits problem (note that `rr` could potentially
        upload all of your hard drive, so be careful what
        you are submitting).

        On Wed, Apr 28, 2021 at 11:13 PM
        pa...@rcom-software.com <pa...@rcom-software.com> wrote:

            Hi Folks:

            I am experiencing an intermittent problem with
            uv_poll_init() after the successful establishment
            and release
            of multiple concurrent TCP connections. I am not
            sure if this problem is due to a bug, which may
            be corrected in another Libuv release, or if I am
            doing something wrong when releasing the poll handle.
            Do you have insight about the cause ? The details
            I consider most important follow.

            The uv_poll_t callback function reads incoming TCP
            data as follows.

                iff(events & UV_READABLE)
                  {
                    CONN_DESC *cdesc = (CONN_DESC *)
            poll_handle->data;
                    n = recv(cdesc->fd, cdesc->rx_buf,
            RX_BUF_SIZE, MSG_DONTWAIT);

            NOTE: I am using Libuv version 1.41.0 running on
            Ubuntu Linux version 15.04.

            The problem is that uv_poll_init() fails, normally
            with the EEXIST or EBADF error code, and
            my investigation indicates the uv_loop_t
            Poll_Loop.watchers[N] field is not NULL when it
            should be,
            where N is TCP socket descriptor. It occurs
            immediately after the uv_poll_t poll_handle is
            allocated.
            (There is exactly one TCP socket descriptor per
            poll handle.) The call to uv_poll_init() is as follows
            and the socket descriptor is obtained with
            uv_fileno().

                if( (r = uv_poll_init(&Poll_Loop, poll_handle,
            cdesc->fd)) != 0)
                   {
                    fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY
            -  Polling Initialization Error %d, %s",
                         r, uv_err_name(r));
                   }

            It occurs in the IO_Task() thread when there
            multiple TCP socket descriptors are in use. The
            IO_task
            releases the poll_handle with the following code
            sequence when it is notified that polling should stop
            via a Libuv async. message from the
            Protocol_Task() thread.

                if( (r = uv_poll_stop(poll_handle)) )
                  {
                    fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY
            - Poll Stop Error %d, %s",
                         r, uv_err_name(r) );
                  }

                poll_handle->data = (void *) cdesc;
                uv_close((uv_handle_t *) poll_handle,
            async_close_callback);

            The actual release occurs in
            async_close_callback() as follows and the
            Protocol_Task() releases the
            TCP socket decriptor with a normal Linux close()
            after it receives the T_DISMANTLE response message.

                VFREE((UCHAR *) handle);
                //
                // Send a notification message to the
            Protocol_Task.
                //
                msg = MSG_ALLOC(0, FALSE);
                msg->class = C_NOTIFY;
                msg->type = T_DISMANTLE_RSP;
                msg->info = 0;

                SendProtoMsg(cdesc, msg);

            I think the underlying cause is that if there is
            new TCP connection that uses a the same TCP socket
            descriptor
            as one released with uv_poll_stop() and
            uv_close(), the call to uv_poll_init() occurs
            before the socket closure
            has propogated into the uv_loop_t Poll_Handle.

            Best Regards,

            Paul Romero
-- You received this message because you are
            subscribed to the Google Groups "libuv" group.
            To unsubscribe from this group and stop receiving
            emails from it, send an email to
            libuv+un...@googlegroups.com.
            To view this discussion on the web visit
            
https://groups.google.com/d/msgid/libuv/c3290a76-ab6b-42ad-8540-33021c6188b9n%40googlegroups.com.

-- You received this message because you are subscribed
        to a topic in the Google Groups "libuv" group.
        To unsubscribe from this topic, visit
        https://groups.google.com/d/topic/libuv/_4ClQoaVPCg/unsubscribe.
        To unsubscribe from this group and all its topics,
        send an email to libuv+un...@googlegroups.com.
        To view this discussion on the web visit
        
https://groups.google.com/d/msgid/libuv/CADnnjUXSsF7QwRnVqFojAm1W_o35CHKyMCPM%2BVw0FM0FjqN1XQ%40mail.gmail.com.

--

        Paul Romero
        -----------
        RCOM Communications Software
        EMAIL:pa...@rcom-software.com
        PHONE:(510)482-2769 <tel:%28510%29%20482-2769>





--

        Paul Romero
        -----------
        RCOM Communications Software
        EMAIL:pa...@rcom-software.com
        PHONE:(510)482-2769 <tel:%28510%29%20482-2769>





--

        Paul Romero
        -----------
        RCOM Communications Software
        EMAIL:pa...@rcom-software.com
        PHONE:(510)482-2769 <tel:%28510%29%20482-2769>




-- You received this message because you are subscribed to a
        topic in the Google Groups "libuv" group.
        To unsubscribe from this topic, visit
        https://groups.google.com/d/topic/libuv/_4ClQoaVPCg/unsubscribe.
        To unsubscribe from this group and all its topics, send
        an email to libuv+un...@googlegroups.com.
        To view this discussion on the web visit
        
https://groups.google.com/d/msgid/libuv/608C50E8.7070202%40rcom-software.com.

--

        Paul Romero
        -----------
        RCOM Communications Software
        EMAIL:pa...@rcom-software.com
        PHONE:(510)482-2769 <tel:%28510%29%20482-2769>





--

        Paul Romero
        -----------
        RCOM Communications Software
        EMAIL:pa...@rcom-software.com
        PHONE:(510)482-2769 <tel:%28510%29%20482-2769>





--

        Paul Romero
        -----------
        RCOM Communications Software
        EMAIL:pa...@rcom-software.com
        PHONE:(510)482-2769 <tel:%28510%29%20482-2769>




-- You received this message because you are subscribed to the
        Google Groups "libuv" group.
        To unsubscribe from this group and stop receiving emails
        from it, send an email to libuv+un...@googlegroups.com.
        To view this discussion on the web visit
        
https://groups.google.com/d/msgid/libuv/608E45B4.4050606%40rcom-software.com.

-- You received this message because you are subscribed to the
    Google Groups "libuv" group.
    To unsubscribe from this group and stop receiving emails from
    it, send an email to libuv+un...@googlegroups.com.
    To view this discussion on the web visit
    
https://groups.google.com/d/msgid/libuv/CADnnjUW%2BMivauJHJm5kib0BxG8TCGSWSp%2BTcbi6CQ6nCrsrgPg%40mail.gmail.com.

--

    Paul Romero
    -----------
    RCOM Communications Software
    EMAIL:pa...@rcom-software.com
    PHONE:(510)482-2769 <tel:%28510%29%20482-2769>





--

    Paul Romero
    -----------
    RCOM Communications Software
    EMAIL:pa...@rcom-software.com
    PHONE:(510)482-2769 <tel:%28510%29%20482-2769>




--
You received this message because you are subscribed to a topic in the Google Groups "libuv" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/libuv/_4ClQoaVPCg/unsubscribe. To unsubscribe from this group and all its topics, send an email to libuv+unsubscr...@googlegroups.com <mailto:libuv+unsubscr...@googlegroups.com>. To view this discussion on the web visit https://groups.google.com/d/msgid/libuv/0c21cbbd-f06f-4758-a013-4e6492040014n%40googlegroups.com <https://groups.google.com/d/msgid/libuv/0c21cbbd-f06f-4758-a013-4e6492040014n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--


Paul Romero
-----------
RCOM Communications Software
EMAIL: pa...@rcom-software.com
PHONE: (510)482-2769




--
You received this message because you are subscribed to the Google Groups 
"libuv" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to libuv+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/libuv/6096B31F.1090006%40rcom-software.com.

Reply via email to