[ 
https://issues.apache.org/jira/browse/HTTPCORE-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461288#comment-13461288
 ] 

Ian Blavins commented on HTTPCORE-195:
--------------------------------------

G'day

I was experiencing the same problem as the originator of this issue (but since 
I was running later code I was actually getting the new TruncatedChunkException 
that was added to the code as a result of this issue).

It turned out I was experiencing the problem because I was closing the 
connection to the web server while I was still reading the chunk. I suspect 
that is why the originator was having his problems. I suggest the reason "... 
we only crawled 20 websites before we started running into this problem. " was 
that the first 19 didn't use chunked output and the reason "We are frequently 
encountering this issue" was that there are plenty of sites that do chunk. 

Note that it would be possible to process a chunked site without error if the 
relative timing of the connection close and completion of the chunk read(s) was 
favourable. So the fact that some chunked sites were processed without error 
wouldn't necessarily disprove the suggestion. I would expect that some chunked 
sites would reliably give the problem and some would give it some of the time.

That being said I didn't find the TruncatedChunkException to be much help 
because I was working at the HttpResponse and HttpClient level. By the time the 
exception reached that level it was way too late to do anything useful about 
it. For the exception to be useful at that level it would need a parameter in 
CoreConnectionPNames. This would be used by callers of ChunkedInputStream to 
decide whether to treat TruncatedChunkException as fatal or treat it as end of 
file. There is already a parameter that deals with buffering of small chunks so 
users of ChunkedInputStream would appear to have access to the parameters.

                
> Make it possible to tolerate truncated chunk streams
> ----------------------------------------------------
>
>                 Key: HTTPCORE-195
>                 URL: https://issues.apache.org/jira/browse/HTTPCORE-195
>             Project: HttpComponents HttpCore
>          Issue Type: Improvement
>          Components: HttpCore NIO
>    Affects Versions: 4.0
>            Reporter: Patrick Moore
>            Priority: Minor
>             Fix For: 4.1-alpha1
>
>         Attachments: chunkValidationDecoupling.patch, HTTPCORE-195.patch
>
>
> Our server is webcrawling.
> We are frequently encountering this issue. We think this might be related to 
> something on the server that we are scanning. But that doesn't matter. We 
> need to handle such cases without exceptions. (From my perspective, such 
> things should generate a debug message -- certainly not an exception that 
> ends processing and throws away the retrieved content! )
> http://stuftpizza.com/ seems to reliably result in this problem
> May be TransferEncoding? 
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6
> Either way we need to be able to deal with issues on the other servers.
> {{{
> Date  Mon, 20 Apr 2009 03:56:45 GMT
> Server        Apache/2.2.3 (Red Hat)
> Accept-Ranges bytes
> Connection    close
> Transfer-Encoding     chunked
> Content-Type  text/html
> '''Request Headers'''
> Host  stuftpizza.com
> User-Agent    Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; 
> rv:1.9.0.8) Gecko/2009032608 Firefox/3.0.8
> Accept        text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language       en-us,en;q=0.5
> Accept-Encoding       gzip,deflate
> Accept-Charset        ISO-8859-1,utf-8;q=0.7,*;q=0.7
> Keep-Alive    300
> Connection    keep-alive
> Cookie        
> __utma=47358053.1237981682.1240199754.1240199754.1240199754.1; 
> __utmb=47358053; __utmc=47358053; __utmz
> =47358053.1240199754.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none)
> Cache-Control max-age=0
> }}}
> {{{
> 20:51:08,768 INFO  [nioEventListener] Request http://stuftpizza.com/ failed 
> with exception.
> org.apache.http.MalformedChunkCodingException: Truncated chunk
>       at 
> org.apache.http.impl.nio.codecs.ChunkDecoder.read(ChunkDecoder.java:203)
>       at 
> org.apache.http.nio.util.SimpleInputBuffer.consumeContent(SimpleInputBuffer.java:60)
>       at 
> org.apache.http.nio.entity.BufferingNHttpEntity.consumeContent(BufferingNHttpEntity.java:72)
>       at 
> org.apache.http.nio.protocol.AsyncNHttpClientHandler.inputReady(AsyncNHttpClientHandler.java:236)
>       at 
> org.apache.http.nio.protocol.BufferingHttpClientHandler.inputReady(BufferingHttpClientHandler.java:118)
>       at 
> org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:178)
>       at 
> org.apache.http.impl.nio.DefaultClientIOEventDispatch.inputReady(DefaultClientIOEventDispatch.java:146)
>       at 
> com.amplafi.iomanagement.http.UniversalIOEventDispatch.inputReady(UniversalIOEventDispatch.java:133)
>       at 
> $IOEventDispatch_120c19cd1c7.inputReady($IOEventDispatch_120c19cd1c7.java)
>       at 
> org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:153)
>       at 
> org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:314)
>       at 
> org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:294)
>       at 
> org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:256)
>       at 
> org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:96)
>       at 
> org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:556)
>       at java.lang.Thread.run(Thread.java:637)
> }}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to