2010/3/26 Sai Pullabhotla <[email protected]>:
> David,
>
> I just re-read your comments towards the end of your previous email:
>
> "I wonder if we are suffering a similar problem in any other cases; if
> it was so, we might need to delay the opening of the ServerSocket
> until the LIST (or GET or PUT...) commands are executed"
>
> Do you think creating/binding a new ServerSocket could potentially
> take a long time? Is that your concern?

Not really, my concern here was that we could have some concurrency
issue,  but this shouldn't be a problem anymore with the wait() calls
removed.



> Regards,
> Sai Pullabhotla
>
>
>
>
>
> On Fri, Mar 26, 2010 at 7:11 AM, David Latorre <[email protected]> wrote:
>> 2010/3/26 Niklas Gustavsson <[email protected]>:
>>> On Fri, Mar 26, 2010 at 9:50 AM, Fred Moore <[email protected]> wrote:
>>>> 1\ Priority of passive port sharing ehnancement: Niklas survey shows that 
>>>> we
>>>> are indeed in good company here, but it's problably worth having a better
>>>> look at this anyway, there might be good technical reasons that led all the
>>>> other teams not to support this or it may turn up that it's "simply" 
>>>> because
>>>> it's somewhat hard to develop and test.
>>>
>>> After this discussion I'm significantly less thrilled at implementing
>>> shared passive ports :-)
>>
>> Shared passive ports would be a nice feature if they aren't too hard
>> to implement. Among the opensource servers, I think coloradoFTP -a
>> NIO-based java FTPServer under the LGPL license- offered this (since
>> their data connections also use async sockets this shouldn't be too
>> hard for them, but I don't know if they solved the use case depicted
>> by Sai: when there are several sessions open from the same IP)  but it
>> seems that commercial solutions offer this and more...
>>
>>
>>
>>>> 2\ Quick fix for 1.0.x codebase: pushing a 40x to the client  when no
>>>> passive port is available (or probably better: no passive port is available
>>>> within X seconds) it's probably something we need to do anyway.
>>>
>>> Thinking some more about this, I'm personally now convinced that
>>> should simple return an error (not waiting). I'm not sure what the
>>> best reply code should be, but "425 Can't open data connection" seems
>>> fitting although not specified as valid return from the PASV command.
>>>
>>>> 3\ Suspect race condition: the problem description for the originally
>>>> reported http://issues.apache.org/jira/browse/FTPSERVER-359 (see also repro
>>>> code attached to the jira) actually hints also to something different as
>>>> well, in fact we state that a few (say 20) parallel threads issuing LISTs 
>>>> in
>>>> passive mode are able to "lock-up" the server forever. Questions:
>>>>
>>>> 3.1\ Is this interely explained by this thread discussion? (I don't think
>>>> so: the server should *always* be able to recover)
>>>
>>> Agreed, the server should always recover from a situation like this.
>>> After looking into how to fix item 2, we need to rerun your tests and
>>> make sure we always survive.
>>
>> Thinking about this issue my understanding of the problem is as follows:
>>
>> 1. We have a number of connections to FTPServer >  the Executor
>> threadpool max  size (I think it is 16) sending  the PASV command.
>>
>> 2. The first one of them requests the only available port and gets it.
>> Now the port is in use by a server socket and any subsequent call to
>> requestPassivePort will end up invoking wait().
>>
>> 3. The thread that processed this PASV command is now available and a
>> new PASV request is assigned to it.
>>
>> 4. Now all threads are trying to request a passive port, but since
>> there are no ports available  all the threads in the OrderedThreadPool
>> get blocked by the wait() method.
>>
>> I wonder if we are suffering a similar problem in any other cases; if
>> it was so, we might need to delay the opening of the ServerSocket
>> until the LIST (or GET or PUT...) commands are executed.
>>
>> I hope I made myself clear and that my understanding was right.
>>
>>
>>>> 3.2\ Would this be fixed by a quick fix as per 2\? (likely, but it's sort 
>>>> of
>>>> like using nukes to for mowing the lawn)
>>>
>>> I really have no idea, but I think we should fix 2 first and then make
>>> sure we handle your test case.
>>>
>>>> In short my current position can be stated as follows: I think that
>>>> FTPSERVER-359 has a different root cause from what we discussed, the 
>>>> problem
>>>>  impact is not completely known at the moment but it appears to *severely*
>>>> affect the server availabily... having just one port is an easy way of
>>>> reproducing it (but not the cause of it).
>>>
>>> Agreed.
>>>
>>> /niklas
>>>
>>
>

Reply via email to