I am using HttpClient 3.0.0 to download a resource to a file. In cases such as dropped connections the download in progress file is kept in place. On the next download attempt of the same resource only the remainder of resource is downloaded using a range request header.
There is some evidence that users of our download code run into cases where socket reads return -1 prematurely, especially for large downloads. In this case our code validates the downloaded bytes and will notice that this is bad content, but as it assumes it downloaded the entire file it will not resume downloading from the current position but restart downloading from the beginning. Perhaps these users experience limitations imposed by a (nonstandard?) proxy server. In any case our code should be robust enough to handle this situation by noticing that it only received portions of the file. In the debugger I can force a similar situation by waiting at a BP for a couple of minutes after the transfer of a partial request has begun. The BP is in a loop which reads the Socket Input Stream and writes the bytes to the file. Below is a portion of a stack trace : Thread [Download Thread 0] (Suspended (breakpoint at line 133 in SocketInputStream)) owns: BufferedInputStream (id=4954) SocketInputStream.read(byte[], int, int) line: 133 SocketEvents $SocketEventEmittingWrapper$SocketEventEmittingInputStream.read(byte[], int, int) line: 298 BufferedInputStream.read1(byte[], int, int) line: 265 BufferedInputStream.read(byte[], int, int) line: 324 ContentLengthInputStream.read(byte[], int, int) line: 169 AutoCloseInputStream(FilterInputStream).read(byte[], int, int) line: 134 AutoCloseInputStream.read(byte[], int, int) line: 107 ... In this case SocketInputStream.read returns -1, apparently before having read all expected bytes. ContentLengthInputStream has only seen 172278 bytes while according to the Content-Length response header 15219053 bytes are to be retrieved. << "Content-Length: 15219053[\r][\n]" << "Content-Range: bytes 11695824-26914876/26914877[\r][\n]" Below is the relevant stack trace and variable values as shown in the debugger: That is in AutoCloseInputStream.close() AutoCloseInputStream(FilterInputStream).close() line: 183 [local variables unavailable] AutoCloseInputStream.notifyWatcher() line: 176 AutoCloseInputStream.checkClose(int) line: 152 AutoCloseInputStream.read (byte[], int, int) line: 108 this AutoCloseInputStream (id=4961) in ContentLengthInputStream (id=4960) closed true contentLength 15219053 [0xe8396d] pos 172278 [0x2a0f6] wrappedStream BufferedInputStream (id=4954) selfClosed false streamOpen true watcher HttpMethodBase$1 (id=4969) this$0 GetMethod (id=4970) Subsequently no IOException is thrown. My reading of http://www.mail-archive.com/[email protected]/msg03923.html was that one can assume that the entire resource was downloaded when EOF is encountered. Now I wonder how/where this situation should be handled? One thought is that ContentLengthInputStream is in a position to know and could throw some kind of premature EOF encountered exception. Presumably the default retry logic would then retry such a request. Or should this be handled by the application? Thanks, Henrich Below are HttpClient TRACE wire trace excerpts: 00:18.45 DEBUG [Thread:ModalContext] org.apache.commons.httpclient.params.DefaultHttpParams setParameter Set parameter http.useragent = Jakarta Commons-HttpClient/3.0 Set parameter http.protocol.version = HTTP/1.1 Set parameter http.connection-manager.class = class org.apache.commons.httpclient.SimpleHttpConnectionManager Set parameter http.protocol.cookie-policy = rfc2109 Set parameter http.protocol.element-charset = US-ASCII Set parameter http.protocol.content-charset = ISO-8859-1 Set parameter http.method.retry-handler = [EMAIL PROTECTED] Set parameter http.dateparser.patterns = [EEE, dd MMM yyyy HH:mm:ss zzz, EEEE, dd-MMM-yy HH:mm:ss zzz, EEE MMM d HH:mm:ss yyyy, EEE, dd-MMM-yyyy HH:mm:ss z, EEE, dd-MMM-yyyy HH-mm-ss z, EEE, dd MMM yy HH:mm:ss z, EEE dd-MMM-yyyy HH:mm:ss z, EEE dd MMM yyyy HH:mm:ss z, EEE dd-MMM-yyyy HH-mm-ss z, EEE dd-MMM-yy HH:mm:ss z, EEE dd MMM yy HH:mm:ss z, EEE,dd-MMM-yy HH:mm:ss z, EEE,dd-MMM-yyyy HH:mm:ss z, EEE, dd-MM-yyyy HH:mm:ss z] 00:18.51 DEBUG [Thread:ModalContext] org.apache.commons.httpclient.HttpClient <clinit> Java version: 1.5.0 Java vendor: IBM Corporation Java class path: C:\AD\Target\e_33GA\eclipse\plugins \org.eclipse.equinox.launcher_1.0.0.v20070606.jar Operating system name: Windows XP Operating system architecture: x86 Operating system version: 5.1 build 2600 Service Pack 2 IBMJSSE2 1.5: IBM JSSE provider2 (implements IbmX509 key/trust factories, SSLv3, TLSv1) IBMJCE 1.2: IBMJCE Provider implements the following: HMAC-SHA1, MD2, MD5, MARS, SHA, MD2withRSA, MD5withRSA, SHA1withRSA, RSA, SHA1withDSA, RC2, RC4, Seal)implements the following: Signature algorithms : SHA1withDSA, SHA1withRSA, MD5withRSA, MD2withRSA, SHA2withRSA, SHA3withRSA, SHA5withRSA Cipher algorithms : Blowfish, AES, DES, TripleDES, PBEWithMD2AndDES, PBEWithMD2AndTripleDES, PBEWithMD2AndRC2, PBEWithMD5AndDES, PBEWithMD5AndTripleDES, PBEWithMD5AndRC2, PBEWithSHA1AndDES PBEWithSHA1AndTripleDES, PBEWithSHA1AndRC2 PBEWithSHAAnd40BitRC2, PBEWithSHAAnd128BitRC2 PBEWithSHAAnd40BitRC4, PBEWithSHAAnd128BitRC4 PBEWithSHAAnd2KeyTripleDES, PBEWithSHAAnd3KeyTripleDES Mars, RC2, RC4, ARCFOUR RSA, Seal Message authentication code (MAC) : HmacSHA1, HmacSHA256, HmacSHA384, HmacSHA512, HmacMD2, HmacMD5 Key agreement algorithm : DiffieHellman Key (pair) generator : Blowfish, DiffieHellman, DSA, AES, DES, TripleDES, HmacMD5, HmacSHA1, Mars, RC2, RC4, RSA, Seal, ARCFOUR Message digest : MD2, MD5, SHA-1, SHA-256, SHA-384, SHA-512 Algorithm parameter generator : DiffieHellman, DSA Algorithm parameter : Blowfish, DiffieHellman, AES, DES, TripleDES, DSA, Mars, PBEwithMD5AndDES, RC2 Key factory : DiffieHellman, DSA, RSA Secret key factory : Blowfish, AES, DES, TripleDES, Mars, RC2, RC4, Seal, ARCFOUR PKCS5Key, PBKDF1 and PBKDF2 (PKCS5Derived Key). Certificate : X.509 Secure random : IBMSecureRandom Key store : JCEKS, PKCS12KS (PKCS12), JKS IBMJGSSProvider 1.5: IBMJGSSProvider supports Kerberos V5 Mechanism IBMCertPath 1.1: IBMCertPath Provider implements the following: CertificateFactory : X.509 CertPathValidator : PKIX CertStore : Collection, LDAP CertPathBuilder : PKIX IBMSASL 1.5: IBM SASL provider(implements client mechanisms for: DIGEST-MD5, GSSAPI, EXTERNAL, PLAIN, CRAM-MD5; server mechanisms for: DIGEST-MD5, GSSAPI, CRAM-MD5) 00:18.56 DEBUG [Thread:ModalContext] org.apache.commons.httpclient.params.DefaultHttpParams setParameter Set parameter http.authentication.credential-provider = [EMAIL PROTECTED] Set parameter http.connection-manager.timeout = 30000 Set parameter http.socket.timeout = 30000 Set parameter http.tcp.nodelay = true Set parameter http.connection-manager.max-per-host = {HostConfiguration []=1} Set parameter http.connection-manager.max-total = 20 Set parameter http.method.retry-handler = com.ibm.cic.common.transports.httpclient.HttpClientDownloadHandler [EMAIL PROTECTED] ... 03:17.93 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.Wire wire >> "HEAD /bluewhale/products/AllGAs/repository/plugins/com.ibm.process.config.rsm_7.0.0.v20061101.jar HTTP/1.1[\r][\n]" 03:17.93 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.HttpMethodBase addHostRequestHeader Adding Host request header 03:17.93 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.Wire wire >> "Accept-Language: en_US[\r][\n]" >> "User-Agent: Jakarta Commons-HttpClient/3.0[\r][\n]" >> "Host: constellation.beaverton.ibm.com[\r][\n]" >> "[\r][\n]" << "HTTP/1.1 200 OK[\r][\n]" << "Date: Tue, 28 Aug 2007 18:40:02 GMT[\r][\n]" << "Server: Apache/2.0.52 (Red Hat)[\r][\n]" << "Last-Modified: Tue, 12 Jun 2007 20:48:27 GMT[\r][\n]" << "ETag: "4f946f-19ab03d-9e8a9cc0"[\r][\n]" << "Accept-Ranges: bytes[\r][\n]" << "Content-Length: 26914877[\r][\n]" << "Connection: close[\r][\n]" << "Content-Type: text/plain; charset=UTF-8[\r][\n]" 03:17.95 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.HttpMethodBase shouldCloseConnection Should close connection in response to directive: close 03:27.56 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.HttpConnection releaseConnection Releasing connection back to connection manager. 03:27.56 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.ConnectionPool freeConnection Freeing connection, hostConfig=HostConfiguration[host= http://constellation.beaverton.ibm.com.] 03:27.56 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.util.IdleConnectionHandler add Adding connection at: 1188326399109 03:27.56 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.ConnectionPool notifyWaitingThread Notifying no-one, there are no waiting threads 03:27.56 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.params.DefaultHttpParams setParameter Set parameter http.method.retry-handler = com.ibm.cic.common.transports.httpclient.HttpClientDownloadHandler [EMAIL PROTECTED] 03:27.56 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.MultiThreadedHttpConnectionManager getConnectionWithTimeout HttpConnectionManager.getConnection: config = HostConfiguration[host= http://constellation.beaverton.ibm.com.], timeout = 30000 03:27.59 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.ConnectionPool getFreeConnection Getting free connection, hostConfig=HostConfiguration[host= http://constellation.beaverton.ibm.com.] 03:27.59 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.HttpConnection open Open connection to constellation.beaverton.ibm.com:80 03:27.59 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.Wire wire >> "GET /bluewhale/products/AllGAs/repository/plugins/com.ibm.process.config.rsm_7.0.0.v20061101.jar HTTP/1.1[\r][\n]" 03:27.59 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.HttpMethodBase addHostRequestHeader Adding Host request header 03:27.59 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.Wire wire >> "Range: bytes=11695824-[\r][\n]" >> "Accept-Language: en_US[\r][\n]" >> "User-Agent: Jakarta Commons-HttpClient/3.0[\r][\n]" >> "Host: constellation.beaverton.ibm.com[\r][\n]" >> "[\r][\n]" << "HTTP/1.1 206 Partial Content[\r][\n]" << "Date: Tue, 28 Aug 2007 18:40:12 GMT[\r][\n]" << "Server: Apache/2.0.52 (Red Hat)[\r][\n]" << "Last-Modified: Tue, 12 Jun 2007 20:48:27 GMT[\r][\n]" << "ETag: "4f946f-19ab03d-9e8a9cc0"[\r][\n]" << "Accept-Ranges: bytes[\r][\n]" << "Content-Length: 15219053[\r][\n]" << "Content-Range: bytes 11695824-26914876/26914877[\r][\n]" << "Connection: close[\r][\n]" << "Content-Type: text/plain; charset=UTF-8[\r][\n]" 54:38.48 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.HttpMethodBase shouldCloseConnection Should close connection in response to directive: close 55:26.04 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.HttpConnection releaseConnection Releasing connection back to connection manager. 55:26.04 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.ConnectionPool freeConnection Freeing connection, hostConfig=HostConfiguration[host= http://constellation.beaverton.ibm.com.] 55:26.04 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.util.IdleConnectionHandler add Adding connection at: 1188329517593 55:26.04 DEBUG [Thread:Download Thread 0] org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.ConnectionPool notifyWaitingThread Notifying no-one, there are no waiting threads
