Hi,

I'm trying to fix an issue in a custom mpm. It's called peruser. More
or less it's a prefork with pools of processes running on different
users.
Additional pool of processes called Multiplexers is accepting
connections and sending them to workers. Each worker pool has it's own
pair of sockets (socketpair(PF_UNIX, SOCK_STREAM)) one for
Multiplexers and other for Workers. Multiplexer sends socket and
request data to Worker using blocking sendmsg(), Workers are using non
blocking
recvmsg().

The code looks like this

in Workers
receive_from_multiplexer()
...
    // Don't block
    ret = recvmsg(ctrl_sock_fd, &msg, MSG_DONTWAIT);

    if (ret == -1 && errno == EAGAIN) {
        _DBG("receive_from_multiplexer recvmsg() EAGAIN, someone was faster");

        return APR_EAGAIN;
    }
    else if (ret == -1) {
        _DBG("recvmsg failed with error \"%s\"", strerror(errno));
                return APR_EGENERAL;
    }
    else _DBG("recvmsg returned %d", ret);

in Multiplexers

    if ((rv = sendmsg(processor->senv->output, &msg, 0)) == -1)
    {
        apr_pool_destroy(r->pool);
        ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ap_server_conf,
                     "Writing message failed %d %d", rv, errno);
        return -1;
    }


The problem is that sometimes Multiplexer is stuck on sendmsg(), and
Worker is stuck on recvmsg()
os is linux 2.6.32 on amd64

sendmsg(74, {msg_name(0)=NULL, msg_iov(5)=[{"y\1\0\0\0\0\0\0", 8},
{"\0\0\0\0\0\0\0\0", 8},
{"\230\322\265\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\364\351
\0\0\2\0\0\0\20\0\0\0\4\0\0\0\20\0\0\0\0\0\0\0l\324\265\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\351\364C\303p
\0\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0F\251\266\0\0\0\0\0@\24
5\256\2\0\0\0\0\370\222\266\0\0\0\0\0`\30\252\366\377\177\0\0\320\30\252\366\377\177\0\0\5\0\0\0\0\0\0\0\364\230\254\242a\177\0\0\6\0\0\0\1\0\0\0\5\0\0\0\1\
0\0\0\4\0\0\0\1\0\0\0\3\0\0\0\1\0\1\0\213\0\0\0\1\0\0\0\220\361\5\0\0\0\0\0",
192}, {"GET /oglxxxxx.html HTTP/1.0\r\nHost: xxxxxxx \r\nU
ser-Agent: Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)\r\nAccept:
text/xml,application/xml,application/xhtml+xml,tex
t/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\nAccept-Language:
en-us,en;q=0.5\r\nAccept-Encoding: gzip\r\nAccept-Charset:
ISO-8859-1,utf-8;q=0.7,*;q=
0.7\r\n\r\n\0", 378}, {"", 0}], msg_controllen=20, {cmsg_len=20,
cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, {151}},
msg_flags=MSG_PROXY|MSG_DONTWAIT}, 0 <
unfinished ...>

Killing destination Workers frees all Multiplexers.

I think that the problem might be in receive_from_multiplexer(), if a
message gets ie half received the code isn't going back to reread
this, receive_from_multiplexer() is called after apr_pool() on
multiple Workers so there's no guarantee that the same one is going
back to reread the message, and this blocks this socket for other
messages.

I know that this is not httpd code, but perusers mailing list is dead,
and I don't have any other ideas where to go with this.

--
Michal Grzedzicki

Reply via email to