Hello,
I have the same problem :
 Error fetching news HTML for: Event[entryData=EntryData(title=Yemeni al-Qaeda 
branch a magnet for jihadists, 
url=http://feeds.washingtonpost.com/click.phdo?i=2657414efdbc5d7710278204dac31246)].
java.io.IOException: CRLF expected at end of chunk: 49/54
    at 
org.apache.commons.httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java:207)
    at 
org.apache.commons.httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java:219)
    at 
org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176)
    at 
org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196)
    at 
org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:369)
    at 
org.apache.commons.httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346)
    at java.io.FilterInputStream.close(FilterInputStream.java:155)
    at 
org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:194)
    at 
org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)
    at java.io.BufferedInputStream.close(BufferedInputStream.java:451)
    at sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:358)
    at sun.nio.cs.StreamDecoder.close(StreamDecoder.java:173)
    at java.io.InputStreamReader.close(InputStreamReader.java:182)
    at net.htmlparser.jericho.Util.getString(Unknown Source)
    at net.htmlparser.jericho.Source.getString(Unknown Source)
    at net.htmlparser.jericho.Source.<init>(Unknown Source)
    at net.htmlparser.jericho.Source.<init>(Unknown Source)
    at ir.ideacenter.biz.service.NewsService.addNewNews(NewsService.java:116)
    at 
ir.ideacenter.biz.service.NewsService$$FastClassByCGLIB$$fa1e5631.invoke(<generated>)
    at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:149)
    at 
org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:692)
    at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
    at 
org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106)
    at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
    at 
org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:625)
    at 
ir.ideacenter.biz.service.NewsService$$EnhancerByCGLIB$$a9589857.addNewNews(<generated>)
    at 
ir.ideacenter.biz.service.fetcher.NewsFetchListener.contentLoaded(NewsFetchListener.java:65)
    at 
ir.ideacenter.biz.crawler.AbstractCrawler.fireEvents(AbstractCrawler.java:56)
    at 
ir.ideacenter.biz.crawler.MultiThreadedNewsCrawler$L2Task.run(MultiThreadedNewsCrawler.java:199)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

Khosro.




________________________________
From: Oleg Kalnichevski <ol...@apache.org>
To: HttpClient User Discussion <httpclient-users@hc.apache.org>
Sent: Wed, January 20, 2010 8:46:49 AM
Subject: Re: HttpClient does not seem to correctly handle chunked response

On Wed, 2010-01-20 at 08:20 -0800, Royan wrote:
> We have an XML API service which splits reply XML data into chunks if it is
> larger then certain amount of bytes. Here is the sample piece of the reply:
> 
> HTTP/1.1 200 OK
> Server: nginx/0.6.35
> Date: Wed, 20 Jan 2010 14:53:27 GMT
> Content-Type: text/xml;charset=UTF-8
> Transfer-Encoding: chunked
> Connection: keep-alive
> X-Powered-By: Servlet 2.4; JBoss-4.2.2.GA (build: SVNTag=JBoss_4_2_2_GA
> date=200710221139)/Tomcat-5.5
> Connection: close
> 
> 1f0d
> <?xml version='1.0' encoding='UTF-8'?>
> <root>
> [...]
> <Label><![CDATA[Some character data br
> 2000
> oken in the middle of the string]]></Label>
> [...]
> <root>
> 
> 0
> 
> 
> The problem is when this XML is retrieved via
> httpResponse.getEntity().getContent() I expect all chunks to be transformed
> into single XML with no service information (I'm talking about some strange
> 2000 number appearing in the middle of the string)
> 
> In fact returned content is not always correctly parsed and contains such
> service information, which in turn makes my XML parser throw an exception
> 
> httpResponse.getEntity().isChunked() always retrieves "true"
> 
> Can anyone advice on what am I doing wrong or otherwise provide information
> how to workaround such issue?
> 
> Thanks,
> Roman 

I cannon recall a single confirmed problem with the correctness of the
chunk coding code in HttpClient in the past 7 (seven) years I am a
committer on the project. 

Double-check your code. If you are reasonably sure this is not an issue
with your code, post a _COMPLETE_ wire / context log of the session and
a test case reproducing the problem (preferably self-contained).

Oleg




---------------------------------------------------------------------
To unsubscribe, e-mail: mailto:httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: mailto:httpclient-users-h...@hc.apache.org


      

Reply via email to