2010/3/26 Sai Pullabhotla <[email protected]>: > David, > > I just re-read your comments towards the end of your previous email: > > "I wonder if we are suffering a similar problem in any other cases; if > it was so, we might need to delay the opening of the ServerSocket > until the LIST (or GET or PUT...) commands are executed" > > Do you think creating/binding a new ServerSocket could potentially > take a long time? Is that your concern?
Not really, my concern here was that we could have some concurrency issue, but this shouldn't be a problem anymore with the wait() calls removed. > Regards, > Sai Pullabhotla > > > > > > On Fri, Mar 26, 2010 at 7:11 AM, David Latorre <[email protected]> wrote: >> 2010/3/26 Niklas Gustavsson <[email protected]>: >>> On Fri, Mar 26, 2010 at 9:50 AM, Fred Moore <[email protected]> wrote: >>>> 1\ Priority of passive port sharing ehnancement: Niklas survey shows that >>>> we >>>> are indeed in good company here, but it's problably worth having a better >>>> look at this anyway, there might be good technical reasons that led all the >>>> other teams not to support this or it may turn up that it's "simply" >>>> because >>>> it's somewhat hard to develop and test. >>> >>> After this discussion I'm significantly less thrilled at implementing >>> shared passive ports :-) >> >> Shared passive ports would be a nice feature if they aren't too hard >> to implement. Among the opensource servers, I think coloradoFTP -a >> NIO-based java FTPServer under the LGPL license- offered this (since >> their data connections also use async sockets this shouldn't be too >> hard for them, but I don't know if they solved the use case depicted >> by Sai: when there are several sessions open from the same IP) but it >> seems that commercial solutions offer this and more... >> >> >> >>>> 2\ Quick fix for 1.0.x codebase: pushing a 40x to the client when no >>>> passive port is available (or probably better: no passive port is available >>>> within X seconds) it's probably something we need to do anyway. >>> >>> Thinking some more about this, I'm personally now convinced that >>> should simple return an error (not waiting). I'm not sure what the >>> best reply code should be, but "425 Can't open data connection" seems >>> fitting although not specified as valid return from the PASV command. >>> >>>> 3\ Suspect race condition: the problem description for the originally >>>> reported http://issues.apache.org/jira/browse/FTPSERVER-359 (see also repro >>>> code attached to the jira) actually hints also to something different as >>>> well, in fact we state that a few (say 20) parallel threads issuing LISTs >>>> in >>>> passive mode are able to "lock-up" the server forever. Questions: >>>> >>>> 3.1\ Is this interely explained by this thread discussion? (I don't think >>>> so: the server should *always* be able to recover) >>> >>> Agreed, the server should always recover from a situation like this. >>> After looking into how to fix item 2, we need to rerun your tests and >>> make sure we always survive. >> >> Thinking about this issue my understanding of the problem is as follows: >> >> 1. We have a number of connections to FTPServer > the Executor >> threadpool max size (I think it is 16) sending the PASV command. >> >> 2. The first one of them requests the only available port and gets it. >> Now the port is in use by a server socket and any subsequent call to >> requestPassivePort will end up invoking wait(). >> >> 3. The thread that processed this PASV command is now available and a >> new PASV request is assigned to it. >> >> 4. Now all threads are trying to request a passive port, but since >> there are no ports available all the threads in the OrderedThreadPool >> get blocked by the wait() method. >> >> I wonder if we are suffering a similar problem in any other cases; if >> it was so, we might need to delay the opening of the ServerSocket >> until the LIST (or GET or PUT...) commands are executed. >> >> I hope I made myself clear and that my understanding was right. >> >> >>>> 3.2\ Would this be fixed by a quick fix as per 2\? (likely, but it's sort >>>> of >>>> like using nukes to for mowing the lawn) >>> >>> I really have no idea, but I think we should fix 2 first and then make >>> sure we handle your test case. >>> >>>> In short my current position can be stated as follows: I think that >>>> FTPSERVER-359 has a different root cause from what we discussed, the >>>> problem >>>> impact is not completely known at the moment but it appears to *severely* >>>> affect the server availabily... having just one port is an easy way of >>>> reproducing it (but not the cause of it). >>> >>> Agreed. >>> >>> /niklas >>> >> >
