[ 
https://issues.apache.org/jira/browse/HTTPCORE-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleg Kalnichevski resolved HTTPCORE-195.
----------------------------------------

    Resolution: Won't Fix

Patrick,

It is just not possible that we include a workaround for each and every broken 
CGI script out there to the stock version of HttpCore. In this particular case 
the protocol exception is perfectly reasonable because it signals potential 
data corruption. This may be not such a big concern for a web crawler but it is 
serious enough for other types of applications.

Consider developing a custom implementation of ChunkDecoder which is more 
lenient about HTTP protocol violations that are tolerable for web crawlers.

Oleg

> ChunkDecoder is overly sensitive to truncated chucks
> ----------------------------------------------------
>
>                 Key: HTTPCORE-195
>                 URL: https://issues.apache.org/jira/browse/HTTPCORE-195
>             Project: HttpComponents HttpCore
>          Issue Type: Bug
>          Components: HttpCore NIO
>    Affects Versions: 4.0
>            Reporter: Patrick Moore
>            Priority: Critical
>
> Our server is webcrawling.
> We are frequently encountering this issue. We think this might be related to 
> something on the server that we are scanning. But that doesn't matter. We 
> need to handle such cases without exceptions. (From my perspective, such 
> things should generate a debug message -- certainly not an exception that 
> ends processing and throws away the retrieved content! )
> http://stuftpizza.com/ seems to reliably result in this problem
> May be TransferEncoding? 
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6
> Either way we need to be able to deal with issues on the other servers.
> {{{
> Date  Mon, 20 Apr 2009 03:56:45 GMT
> Server        Apache/2.2.3 (Red Hat)
> Accept-Ranges bytes
> Connection    close
> Transfer-Encoding     chunked
> Content-Type  text/html
> '''Request Headers'''
> Host  stuftpizza.com
> User-Agent    Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; 
> rv:1.9.0.8) Gecko/2009032608 Firefox/3.0.8
> Accept        text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language       en-us,en;q=0.5
> Accept-Encoding       gzip,deflate
> Accept-Charset        ISO-8859-1,utf-8;q=0.7,*;q=0.7
> Keep-Alive    300
> Connection    keep-alive
> Cookie        
> __utma=47358053.1237981682.1240199754.1240199754.1240199754.1; 
> __utmb=47358053; __utmc=47358053; __utmz
> =47358053.1240199754.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none)
> Cache-Control max-age=0
> }}}
> {{{
> 20:51:08,768 INFO  [nioEventListener] Request http://stuftpizza.com/ failed 
> with exception.
> org.apache.http.MalformedChunkCodingException: Truncated chunk
>       at 
> org.apache.http.impl.nio.codecs.ChunkDecoder.read(ChunkDecoder.java:203)
>       at 
> org.apache.http.nio.util.SimpleInputBuffer.consumeContent(SimpleInputBuffer.java:60)
>       at 
> org.apache.http.nio.entity.BufferingNHttpEntity.consumeContent(BufferingNHttpEntity.java:72)
>       at 
> org.apache.http.nio.protocol.AsyncNHttpClientHandler.inputReady(AsyncNHttpClientHandler.java:236)
>       at 
> org.apache.http.nio.protocol.BufferingHttpClientHandler.inputReady(BufferingHttpClientHandler.java:118)
>       at 
> org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:178)
>       at 
> org.apache.http.impl.nio.DefaultClientIOEventDispatch.inputReady(DefaultClientIOEventDispatch.java:146)
>       at 
> com.amplafi.iomanagement.http.UniversalIOEventDispatch.inputReady(UniversalIOEventDispatch.java:133)
>       at 
> $IOEventDispatch_120c19cd1c7.inputReady($IOEventDispatch_120c19cd1c7.java)
>       at 
> org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:153)
>       at 
> org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:314)
>       at 
> org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:294)
>       at 
> org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:256)
>       at 
> org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:96)
>       at 
> org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:556)
>       at java.lang.Thread.run(Thread.java:637)
> }}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to