On Fri, 2006-10-27 at 00:32 -0400, Joe Todaro wrote: > > Hi, > > Has anyone ever seen this error before in their *plague-0.5.0* build > environment? It surfaced last week shortly after we started > stress-testing our buildsystem. In fact, there were three such > errors in all, which I will post separately to avoid any confusion. > This is one of three. It was triggered when we requested status > about a job we killed before it actually got handed-off to archjobs. > > ====== THE ERROR ======- > Request to enqueue 'stacker' tag 'stacker-1_3-5' for target > 'oc-rhel4-dev' (user '[EMAIL PROTECTED]') > 66 (stacker): Starting tag 'stacker-1_3-5' on target 'oc-rhel4-dev' > 66 (stacker): Requesting depsolve... > 66 (stacker): Starting depsolve for arches: ['i686']. > 66 (stacker): Finished depsolve (successful), requesting archjobs. > 66 (stacker/i686): https://lnxbuild1.pok.ibm.com.:8888 - UID is > 9adf56cdd15bfae2388966b08837250d3bf6772c > ---------------------------------------- > Exception happened during processing of request from ('10.63.82.73', > 49136) > Traceback (most recent call last): > File "/usr/lib64/python2.3/SocketServer.py", line 463, in > process_request_thread > self.finish_request(request, client_address) > File "/usr/lib64/python2.3/SocketServer.py", line 254, in > finish_request > self.RequestHandlerClass(request, client_address, self) > File "/usr/lib64/python2.3/SocketServer.py", line 521, in __init__ > self.handle() > File "/usr/lib64/python2.3/BaseHTTPServer.py", line 324, in handle > self.handle_one_request() > File "/usr/lib64/python2.3/BaseHTTPServer.py", line 307, in > handle_one_request > self.raw_requestline = self.rfile.readline() > File "/usr/lib64/python2.3/socket.py", line 338, in readline > data = self._sock.recv(self._rbufsize) > File "/usr/lib/python2.3/site-packages/plague/SSLConnection.py", > line 142, in recv > return con.recv(bufsize, flags) > SysCallError: (-1, 'Unexpected EOF') > ---------------------------------------- > > ====== OUR FIX ====== > We added lines 147-148 to the *recv* method of the > */usr/lib/python2.3/site-packages/plague/SSLConnection.py* module. > Here's the patch: > > > So, can someone please review the above fix.. We want to make sure it > won't come back to *bite* us later on / or possibly evn be *masking* a > larger problem. Thank you.
This one makes me a bit nervous. The SSL stuff is pretty fragile, since SSL in general adds yet another protocol layer on top of everything that's subject to more handshakes and state over just TCP/IP. The traceback here shouldn't really have an effect, since it just terminates the current thread, and plague's state machine is built to be resilient to dropped and dead connection threads. I'd like to hide the traceback (or at least just print a one-line message) but that's not possible since plague code isn't anywhere in the traceback and therefore would require more subclassing. Furthermore, it technically is an error (that the other side closed the socket prematurely or something broke the connection) but one that we should ignore and retry, which plague will do. However, if this fix seems to work OK for you for a while, I'd be interested in revisiting the issue. Dan > -Joe > -- > Fedora-buildsys-list mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/fedora-buildsys-list -- Fedora-buildsys-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/fedora-buildsys-list
