Ok, I'll take a look. --Michi
On Fri, Jun 8, 2012 at 3:17 PM, Patrick Hunt <[email protected]> wrote: > Speaking of windows, Michi can you take a look why the windows job has > started failing of late? Perhaps an environment change? (you might > look at other windows jobs on that box to get an idea) > > https://builds.apache.org//view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-WinVS2008/ > > Thanks! > > Patrick > > On Fri, Jun 8, 2012 at 10:16 AM, Michi Mutsuzaki <[email protected]> > wrote: >> I think there is a bug in windows port (are you on windows?) that >> doesn't set recursive attribute for the to_send mutex. Please open a >> jira: >> >> https://issues.apache.org/jira/browse/ZOOKEEPER >> >> Thanks! >> --Michi >> >> On Fri, Jun 8, 2012 at 1:00 AM, 乱麻的魅力 <[email protected]> wrote: >>> hi dev: >>> I now try to use the zookeeper cli (c code version)to connect the >>> zookeeper server, but i find only can connect to ZK,but cann't send any cmd >>> to ZK, like "ls /". if i send cmd ,then zk-cli goto deadlock at this line >>> lock_buffer_list(list) {//LINE 00945 dequeue_buffer() function of >>> zookeeper.c}; then i try to locate this case. >>> >>> i download the zk cli (ver 3.4.3) from >>> http://labs.renren.com/apache-mirror/zookeeper/ , buid the project again, >>> find bug locate the line 00945 in >>> zookeeper-3.4.3.tar.gz\zookeeper-3.4.3\src\c\src\zookeeper.c too. now i >>> describe this case below: >>> >>> 1 if client send cmd to ZKserver, client need call some function to send >>> the cmd ,like zoo_awget,send_ping,zoo_aget,etc.., all this function need >>> call adaptor_send_queue(zh, 0); then below... >>> >>> 2 adaptor_send_queue(zh, 0) call flush_send_queue(zh, timeout); >>> >>> int flush_send_queue(zhandle_t*zh, int timeout) >>> { >>> int rc= ZOK; >>> struct timeval started; >>> #ifdef WIN32 >>> fd_set pollSet; >>> struct timeval wait; >>> #endif >>> gettimeofday(&started,0); >>> // we can't use dequeue_buffer() here because if (non-blocking) >>> send_buffer() >>> // returns EWOULDBLOCK we'd have to put the buffer back on the queue. >>> // we use a recursive lock instead and only dequeue the buffer if a send >>> was >>> // successful >>> lock_buffer_list(&zh->to_send); /*first time lock the buffer, wfs >>> 20120608 */ >>> while (zh->to_send.head != 0&& zh->state == ZOO_CONNECTED_STATE) { >>> if(timeout!=0){ >>> int elapsed; >>> struct timeval now; >>> gettimeofday(&now,0); >>> elapsed=calculate_interval(&started,&now); >>> if (elapsed>timeout) { >>> rc = ZOPERATIONTIMEOUT; >>> break; >>> } >>> #ifdef WIN32 >>> wait = get_timeval(timeout-elapsed); >>> FD_ZERO(&pollSet); >>> FD_SET(zh->fd, &pollSet); >>> // Poll the socket >>> rc = select((int)(zh->fd)+1, NULL, &pollSet, NULL, &wait); >>> #else >>> struct pollfd fds; >>> fds.fd = zh->fd; >>> fds.events = POLLOUT; >>> fds.revents = 0; >>> rc = poll(&fds, 1, timeout-elapsed); >>> #endif >>> if (rc<=0) { >>> /* timed out or an error or POLLERR */ >>> rc = rc==0 ? ZOPERATIONTIMEOUT : ZSYSTEMERROR; >>> break; >>> } >>> } >>> rc = send_buffer(zh->fd, zh->to_send.head); >>> if(rc==0 && timeout==0){ >>> /* send_buffer would block while sending this buffer */ >>> rc = ZOK; >>> break; >>> } >>> if (rc < 0) { >>> rc = ZCONNECTIONLOSS; >>> break; >>> } >>> // if the buffer has been sent successfully, remove it from the queue >>> if (rc > 0) >>> remove_buffer(&zh->to_send); /*this function will second time >>> lock the buffer with lock under locked status, wfs 20120608 */ >>> >>> gettimeofday(&zh->last_send, 0); >>> rc = ZOK; >>> } >>> unlock_buffer_list(&zh->to_send); >>> return rc; >>> } >>> >>> static int remove_buffer(buffer_head_t *list) >>> { >>> buffer_list_t *b = dequeue_buffer(list); >>> if (!b) { >>> return 0; >>> } >>> free_buffer(b); >>> return 1; >>> } >>> >>> static buffer_list_t *dequeue_buffer(buffer_head_t *list) >>> { >>> buffer_list_t *b; >>> lock_buffer_list(list); /*this function second time lock the buffer >>> with lock under locked status 20120608 , then will lead the function to >>> deadlock at this line; >>> >>> if i re-write a new function like *dequeue_buffer(buffer_head_t *list) >>> and remove_buffer function without lock and unlock to be callback by >>> flush_send_queue, then zk-cli can send cmd to the zookkeeper server, clie >>> don't deadlock*/ >>> >>> b = list->head; >>> if (b) { >>> list->head = b->next; >>> if (!list->head) { >>> assert(b == list->last); >>> list->last = 0; >>> } >>> } >>> unlock_buffer_list(list); >>> return b; >>> } >>> >>> i don't known whether I detailly describe this case, and i find old >>> version 3.3.3 have this bug too,i think this c source-code maybe never be >>> tested or i use wrong way, can you help me clear this case。 >>> >>> thanks! >>> wfs fr china 20120608
