[ 
https://issues.apache.org/jira/browse/NET-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656669#action_12656669
 ] 

themusicgod1 edited comment on NET-160 at 12/15/08 10:17 AM:
-----------------------------------------------------------

There may be multiple points where hanging can occur, but it seems like a lot 
of them share a common characteristic, described somewhat by Guy Rouillier in 
this thread.

For some of these hangs it looks like from within copyStream(), the InputStream 
source is blocking and not being refilled. The fact that it's blocking at at 
java.net.SocketInputStream.socketRead0(Native Method) is just a symptom of this 
level's not knowing that the InputStream is not available for reading. It's 
easy to simulate, too; merely unplug your network cable during a read. A 
connect() will rightly detect that there's nothing there to connect to, but 
while read(), 95% of the time seems to block indefinitely, even when 
setSoTimeout() and other set*Timeout()s are used.

As far as copyStream() is concerned, You can actually detect when it blocks, 
but there's a problem with that in that it sometimes blocks and then waits, and 
then after a short period(ms) source is refilled. One way to get around this in 
realistic situations is to have another Timeout\*\*\*; this one from within 
Util itself, or probably better yet, encapsulate Util in a FTPClient specific 
Util that has this extra bit of baggage for these situations\*. It seems to 
work\*\*, but only for some aspects of this problem\*\*\*\*. It'll at least 
give you an CopyStreamException, which for critical applications is a lot 
better than infinite waits on a nonexistent input streams caused by noncritical 
network error.

Also notable for Sebb, who suggested using PASV mode:
Thread [main] (Suspended)
SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) line: not 
available [native method]
SocketInputStream.read(byte[], int, int) line: 129
StreamDecoder.readBytes() line: 264 [local variables unavailable]
StreamDecoder.implRead(char[], int, int) line: 306
StreamDecoder.read(char[], int, int) line: 158
InputStreamReader.read(char[], int, int) line: 167
BufferedReader.fill() line: 136 [local variables unavailable]
BufferedReader.readLine(boolean) line: 299
BufferedReader.readLine() line: 362 [local variables unavailable]
FTPClient(FTP).__getReply() line: 294
FTPClient(FTP).sendCommand(String, String) line: 490
FTPClient(FTP).sendCommand(int, String) line: 534
FTPClient(FTP).sendCommand(int) line: 583
FTPClient(FTP).pasv() line: 882
FTPClient._openDataConnection_(int, String) line: 497
FTPClient.retrieveFile(String, OutputStream) line: 1276
FTPStation.getZile(String, String, int, String, String, int) line: 160

ie...these hangs can occur *going into* PASV mode. Same root cause, I suspect: 
__getReply() is being called and its result is not being given to it, so it 
waits. Forever. Instead of at least throwing an Exception.

To perhaps rephrase(after a weekend to think about it); what seems to be 
happening, all over the FTPClient code, is any time a read is made to a stream 
before it is filled ends up being an infinite block. And this can happen. For 
example, in FTPClient's __getReply(). If a call is made to __getReply() before 
the server actually replies, __getReply() blocks indefinitely. Even if the 
server actually does reply a millisecond or so after the indefinite blocking 
begins.

*edit*: oh I might want to point out that I'm on ubuntu 8.10/sun java 5-6 
(u10)/filezilla server/passive mode

\* I'm a fairly new java programmer so maybe this is not optimal.
\*\* I'll attatch my updated Util class to this thread.
\*\*\* Someone pointed out that this is basically making setSoTimeout work. Not 
sure what to do about this
\*\*\*\* albeit the most common, the actual downloading of files. 
Uploading/connecting/disconnecting/other not really impacted



      was (Author: themusicgod1):
    There may be multiple points where hanging can occur, but it seems like a 
lot of them share a common characteristic, described somewhat by Guy Rouillier 
in this thread.

For some of these hangs it looks like from within copyStream(), the InputStream 
source is blocking and not being refilled. The fact that it's blocking at at 
java.net.SocketInputStream.socketRead0(Native Method) is just a symptom of this 
level's not knowing that the InputStream is not available for reading. It's 
easy to simulate, too; merely unplug your network cable during a read. A 
connect() will rightly detect that there's nothing there to connect to, but 
while read(), 95% of the time seems to block indefinitely, even when 
setSoTimeout() and other set*Timeout()s are used.

As far as copyStream() is concerned, You can actually detect when it blocks, 
but there's a problem with that in that it sometimes blocks and then waits, and 
then after a short period(ms) source is refilled. One way to get around this in 
realistic situations is to have another Timeout\*\*\*; this one from within 
Util itself, or probably better yet, encapsulate Util in a FTPClient specific 
Util that has this extra bit of baggage for these situations\*. It seems to 
work\*\*, but only for some aspects of this problem\*\*\*\*. It'll at least 
give you an CopyStreamException, which for critical applications is a lot 
better than infinite waits on a nonexistent input streams caused by noncritical 
network error.

Also notable for Sebb, who suggested using PASV mode:
Thread [main] (Suspended)
SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) line: not 
available [native method]
SocketInputStream.read(byte[], int, int) line: 129
StreamDecoder.readBytes() line: 264 [local variables unavailable]
StreamDecoder.implRead(char[], int, int) line: 306
StreamDecoder.read(char[], int, int) line: 158
InputStreamReader.read(char[], int, int) line: 167
BufferedReader.fill() line: 136 [local variables unavailable]
BufferedReader.readLine(boolean) line: 299
BufferedReader.readLine() line: 362 [local variables unavailable]
FTPClient(FTP).__getReply() line: 294
FTPClient(FTP).sendCommand(String, String) line: 490
FTPClient(FTP).sendCommand(int, String) line: 534
FTPClient(FTP).sendCommand(int) line: 583
FTPClient(FTP).pasv() line: 882
FTPClient._openDataConnection_(int, String) line: 497
FTPClient.retrieveFile(String, OutputStream) line: 1276
FTPStation.getZile(String, String, int, String, String, int) line: 160

ie...these hangs can occur *going into* PASV mode. Same root cause, I suspect: 
__getReply() is being called and its result is not being given to it, so it 
waits. Forever. Instead of at least throwing an Exception.

To perhaps rephrase(after a weekend to think about it); what seems to be 
happening, all over the FTPClient code, is any time a read is made to a stream 
before it is filled ends up being an infinite block. And this can happen. For 
example, in FTPClient's __getReply(). If a call is made to __getReply() before 
the server actually replies, __getReply() blocks indefinitely. Even if the 
server actually does reply a millisecond or so after the indefinite blocking 
begins.

*edit*: oh I might want to point out that I'm on ubuntu 8.10/sun java 6 
(u10)/filezilla server/passive mode

\* I'm a fairly new java programmer so maybe this is not optimal.
\*\* I'll attatch my updated Util class to this thread.
\*\*\* Someone pointed out that this is basically making setSoTimeout work. Not 
sure what to do about this
\*\*\*\* albeit the most common, the actual downloading of files. 
Uploading/connecting/disconnecting/other not really impacted


  
> FTP Client API Hangs - Attached Scenario.
> -----------------------------------------
>
>                 Key: NET-160
>                 URL: https://issues.apache.org/jira/browse/NET-160
>             Project: Commons Net
>          Issue Type: Bug
>    Affects Versions: 1.4
>         Environment: Suse Enterprise Linux 9.0, Intel
>            Reporter: Lvenkataraman
>         Attachments: Util.java
>
>
> FTP client hangs with the following error.   Here is the truss output. The 
> client hangs indefinitely and no work is being done other than this error 
> when you truss the process.  I use Java 1.5 to run this.  We wrote a 
> long-running ftp daemon and this happens after 20-23 hours of continuous 
> running.
> Here is the code to pull files from the ftp server. This one lists files that 
> are received after this particular "incal" time (currently it lists but the 
> intention is to implement an incremental download).  We initially thought 
> doing periodic noop would keep the client alive by sending dummy commands to 
> the server. But we found this probelm still occurs.
> public void ListIncrementalFiles (String directory, Calendar incal)
>       {
>               try 
>               {
>                       FTPFile[] files = listFiles(directory);
>                       
>                       int total_files = 0;
>                                       
>                       System.out.println("FileName\tFileSize\t\tDate Time 
> Stamp");
>                       for (int i = 0; i < files.length; i++)
>                       {
>                                                                               
>                 
>                                       FTPFile file = files[i];
>                                       if ((i % 1000) == 0)
>                                       {
>                                               System.out.println("noop 
> occurred at this instance---> " + i + "\n");
>                                               noop();
>                                       }
>                                                       
>                                       if (file.isFile())
>                                       {
>                                               Calendar c = 
> file.getTimestamp();
>                                                                               
>                 
>                                               if (c.after(incal) || 
> c.equals(incal))
>                                               {
>                                                               
>                                                               
> System.out.println(file.getName() + "\t" + file.getSize() + "\t\t" + 
> c.getTime());
>                                                               total_files++;
>                                               }
>                                        }
>                               
>                       }
>                       System.out.println("Total Files in the listing ----> " 
> + total_files);
>               } 
>               catch (Exception e)
>               {
>                       System.out.println("FTP Getfiles exception occured");
>                       e.printStackTrace();
>               }
>     
>       }
> ---------------------------------------------------------------------------------------------------------------------------------Truss
>  output begins--------------------------------------------------------
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /6:   lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /9:   lwp_park(0x00000000, 0)         (sleeping...)
> /3:   lwp_cond_wait(0x000B77C0, 0x000B77A8, 0x00000000, 0) (sleeping...)
> /8:   lwp_cond_wait(0x0014A908, 0x0014A8F0, 0x00000000, 0) (sleeping...)
> /10:  lwp_cond_wait(0x000366E8, 0x000366D0, 0x00000000, 0) (sleeping...)
> /1:   pollsys(0x00000000, 0, 0xFFBFEB60, 0x00000000) (sleeping...)
> /11:  lwp_cond_wait(0x00036778, 0x00036760, 0x00000000, 0) (sleeping...)
> /4:   lwp_cond_wait(0x000B77C0, 0x000B77A8, 0x00000000, 0) (sleeping...)
> /12:  lwp_cond_wait(0x00036778, 0x00036760, 0x00000000, 0) (sleeping...)
> /2:   lwp_cond_wait(0x000B77C0, 0x000B77A8, 0x00000000, 0) (sleeping...)
> /7:   lwp_cond_wait(0x00149F98, 0x00149F80, 0x00000000, 0) (sleeping...)
> /5:   lwp_cond_wait(0x000B77C0, 0x000B77A8, 0x00000000, 0) (sleeping...)
> /13:  lwp_cond_wait(0x00034958, 0x00034940, 0x00000000, 0) (sleeping...)
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /6:   lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /6:   lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /6:   lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /6:   lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /6:   lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /6:   lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> /14:  pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)  = 0
> ------------------------------------------------------------------------------------------Truss
>  output 
> ends-------------------------------------------------------------------------------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to