[
https://issues.apache.org/jira/browse/NET-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656669#action_12656669
]
themusicgod1 edited comment on NET-160 at 12/15/08 9:40 AM:
----------------------------------------------------------
There may be multiple points where hanging can occur, but it seems like a lot
of them share a common characteristic, described somewhat by Guy Rouillier in
this thread.
For some of these hangs it looks like from within copyStream(), the InputStream
source is blocking and not being refilled. The fact that it's blocking at at
java.net.SocketInputStream.socketRead0(Native Method) is just a symptom of this
level's not knowing that the InputStream is not available for reading. It's
easy to simulate, too; merely unplug your network cable during a read. A
connect() will rightly detect that there's nothing there to connect to, but
while read(), 95% of the time seems to block indefinitely, even when
setSoTimeout() and other set*Timeout()s are used.
As far as copyStream() is concerned, You can actually detect when it blocks,
but there's a problem with that in that it sometimes blocks and then waits, and
then after a short period(ms) source is refilled. One way to get around this in
realistic situations is to have another Timeout\*\*\*; this one from within
Util itself, or probably better yet, encapsulate Util in a FTPClient specific
Util that has this extra bit of baggage for these situations\*. It seems to
work\*\*, but only for some aspects of this problem\*\*\*\*. It'll at least
give you an CopyStreamException, which for critical applications is a lot
better than infinite waits on a nonexistent input streams caused by noncritical
network error.
Also notable for Sebb, who suggested using PASV mode:
Thread [main] (Suspended)
SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) line: not
available [native method]
SocketInputStream.read(byte[], int, int) line: 129
StreamDecoder.readBytes() line: 264 [local variables unavailable]
StreamDecoder.implRead(char[], int, int) line: 306
StreamDecoder.read(char[], int, int) line: 158
InputStreamReader.read(char[], int, int) line: 167
BufferedReader.fill() line: 136 [local variables unavailable]
BufferedReader.readLine(boolean) line: 299
BufferedReader.readLine() line: 362 [local variables unavailable]
FTPClient(FTP).__getReply() line: 294
FTPClient(FTP).sendCommand(String, String) line: 490
FTPClient(FTP).sendCommand(int, String) line: 534
FTPClient(FTP).sendCommand(int) line: 583
FTPClient(FTP).pasv() line: 882
FTPClient._openDataConnection_(int, String) line: 497
FTPClient.retrieveFile(String, OutputStream) line: 1276
FTPStation.getZile(String, String, int, String, String, int) line: 160
ie...these hangs can occur *going into* PASV mode. Same root cause, I suspect:
__getReply() is being called and its result is not being given to it, so it
waits. Forever. Instead of at least throwing an Exception.
To perhaps rephrase(after a weekend to think about it); what seems to be
happening, all over the FTPClient code, is any time a read is made to a stream
before it is filled ends up being an infinite block. And this can happen. For
example, in FTPClient.__getReply(), if a call is made to __getReply before the
server actually replies, __getReply() blocks indefinitely. Even if the server
actually does reply a millisecond or so after the indefinite blocking begins.
\* I'm a fairly new java programmer so maybe this is not optimal.
\*\* I'll attatch my updated Util class to this thread.
\*\*\* Someone pointed out that this is basically making setSoTimeout work. Not
sure what to do about this
\*\*\*\* albeit the most common, the actual downloading of files.
Uploading/connecting/disconnecting/other not really impacted
was (Author: themusicgod1):
There may be multiple points where hanging can occur, but it seems like a
lot of them share a common characteristic, described somewhat by Guy Rouillier
in this thread.
For some of these hangs it looks like from within copyStream(), the InputStream
source is blocking and not being refilled. The fact that it's blocking at at
java.net.SocketInputStream.socketRead0(Native Method) is just a symptom of this
level's not knowing that the InputStream is not available for reading. It's
easy to simulate, too; merely unplug your network cable during a read. A
connect() will rightly detect that there's nothing there to connect to, but
while read(), 95% of the time seems to block indefinitely, even when
setSoTimeout() and other set*Timeout()s are used.
As far as copyStream() is concerned, You can actually detect when it blocks,
but there's a problem with that in that it sometimes blocks and then waits, and
then after a short period(ms) source is refilled. One way to get around this in
realistic situations is to have another Timeout***; this one from within Util
itself, or probably better yet, encapsulate Util in a FTPClient specific Util
that has this extra bit of baggage for these situations*. It seems to work**,
but only for some aspects of this problem****. It'll at least give you an
CopyStreamException, which for critical applications is a lot better than
infinite waits on a nonexistent input streams caused by noncritical network
error.
Also notable for Sebb, who suggested using PASV mode:
Thread [main] (Suspended)
SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) line: not
available [native method]
SocketInputStream.read(byte[], int, int) line: 129
StreamDecoder.readBytes() line: 264 [local variables unavailable]
StreamDecoder.implRead(char[], int, int) line: 306
StreamDecoder.read(char[], int, int) line: 158
InputStreamReader.read(char[], int, int) line: 167
BufferedReader.fill() line: 136 [local variables unavailable]
BufferedReader.readLine(boolean) line: 299
BufferedReader.readLine() line: 362 [local variables unavailable]
FTPClient(FTP).__getReply() line: 294
FTPClient(FTP).sendCommand(String, String) line: 490
FTPClient(FTP).sendCommand(int, String) line: 534
FTPClient(FTP).sendCommand(int) line: 583
FTPClient(FTP).pasv() line: 882
FTPClient._openDataConnection_(int, String) line: 497
FTPClient.retrieveFile(String, OutputStream) line: 1276
FTPStation.getZile(String, String, int, String, String, int) line: 160
ie...these hangs can occur *going into* PASV mode. Same root cause, I suspect:
__getReply() is being called and its result is not being given to it, so it
waits. Forever. Instead of at least throwing an Exception.
To perhaps rephrase(after a weekend to think about it); what seems to be
happening, all over the FTPClient code, is any time a read is made to a stream
before it is filled ends up being an infinite block. And this can happen. For
example, in FTPClient.__getReply(), if a call is made to __getReply before the
server actually replies, __getReply() blocks indefinitely. Even if the server
actually does reply a millisecond or so after the indefinite blocking begins.
* I'm a fairly new java programmer so maybe this is not optimal.
** I'll attatch my updated Util class to this thread.
*** Someone pointed out that this is basically making setSoTimeout work. Not
sure what to do about this
**** albeit the most common, the actual downloading of files.
Uploading/connecting/disconnecting/other not really impacted
> FTP Client API Hangs - Attached Scenario.
> -----------------------------------------
>
> Key: NET-160
> URL: https://issues.apache.org/jira/browse/NET-160
> Project: Commons Net
> Issue Type: Bug
> Affects Versions: 1.4
> Environment: Suse Enterprise Linux 9.0, Intel
> Reporter: Lvenkataraman
> Attachments: Util.java
>
>
> FTP client hangs with the following error. Here is the truss output. The
> client hangs indefinitely and no work is being done other than this error
> when you truss the process. I use Java 1.5 to run this. We wrote a
> long-running ftp daemon and this happens after 20-23 hours of continuous
> running.
> Here is the code to pull files from the ftp server. This one lists files that
> are received after this particular "incal" time (currently it lists but the
> intention is to implement an incremental download). We initially thought
> doing periodic noop would keep the client alive by sending dummy commands to
> the server. But we found this probelm still occurs.
> public void ListIncrementalFiles (String directory, Calendar incal)
> {
> try
> {
> FTPFile[] files = listFiles(directory);
>
> int total_files = 0;
>
> System.out.println("FileName\tFileSize\t\tDate Time
> Stamp");
> for (int i = 0; i < files.length; i++)
> {
>
>
> FTPFile file = files[i];
> if ((i % 1000) == 0)
> {
> System.out.println("noop
> occurred at this instance---> " + i + "\n");
> noop();
> }
>
> if (file.isFile())
> {
> Calendar c =
> file.getTimestamp();
>
>
> if (c.after(incal) ||
> c.equals(incal))
> {
>
>
> System.out.println(file.getName() + "\t" + file.getSize() + "\t\t" +
> c.getTime());
> total_files++;
> }
> }
>
> }
> System.out.println("Total Files in the listing ----> "
> + total_files);
> }
> catch (Exception e)
> {
> System.out.println("FTP Getfiles exception occured");
> e.printStackTrace();
> }
>
> }
> ---------------------------------------------------------------------------------------------------------------------------------Truss
> output begins--------------------------------------------------------
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /6: lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /9: lwp_park(0x00000000, 0) (sleeping...)
> /3: lwp_cond_wait(0x000B77C0, 0x000B77A8, 0x00000000, 0) (sleeping...)
> /8: lwp_cond_wait(0x0014A908, 0x0014A8F0, 0x00000000, 0) (sleeping...)
> /10: lwp_cond_wait(0x000366E8, 0x000366D0, 0x00000000, 0) (sleeping...)
> /1: pollsys(0x00000000, 0, 0xFFBFEB60, 0x00000000) (sleeping...)
> /11: lwp_cond_wait(0x00036778, 0x00036760, 0x00000000, 0) (sleeping...)
> /4: lwp_cond_wait(0x000B77C0, 0x000B77A8, 0x00000000, 0) (sleeping...)
> /12: lwp_cond_wait(0x00036778, 0x00036760, 0x00000000, 0) (sleeping...)
> /2: lwp_cond_wait(0x000B77C0, 0x000B77A8, 0x00000000, 0) (sleeping...)
> /7: lwp_cond_wait(0x00149F98, 0x00149F80, 0x00000000, 0) (sleeping...)
> /5: lwp_cond_wait(0x000B77C0, 0x000B77A8, 0x00000000, 0) (sleeping...)
> /13: lwp_cond_wait(0x00034958, 0x00034940, 0x00000000, 0) (sleeping...)
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /6: lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /6: lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /6: lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /6: lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /6: lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /6: lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> /14: pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000) = 0
> ------------------------------------------------------------------------------------------Truss
> output
> ends-------------------------------------------------------------------------------------------
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.