Hello, I have the same problem : Error fetching news HTML for: Event[entryData=EntryData(title=Yemeni al-Qaeda branch a magnet for jihadists, url=http://feeds.washingtonpost.com/click.phdo?i=2657414efdbc5d7710278204dac31246)]. java.io.IOException: CRLF expected at end of chunk: 49/54 at org.apache.commons.httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java:207) at org.apache.commons.httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java:219) at org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176) at org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196) at org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:369) at org.apache.commons.httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346) at java.io.FilterInputStream.close(FilterInputStream.java:155) at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:194) at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158) at java.io.BufferedInputStream.close(BufferedInputStream.java:451) at sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:358) at sun.nio.cs.StreamDecoder.close(StreamDecoder.java:173) at java.io.InputStreamReader.close(InputStreamReader.java:182) at net.htmlparser.jericho.Util.getString(Unknown Source) at net.htmlparser.jericho.Source.getString(Unknown Source) at net.htmlparser.jericho.Source.<init>(Unknown Source) at net.htmlparser.jericho.Source.<init>(Unknown Source) at ir.ideacenter.biz.service.NewsService.addNewNews(NewsService.java:116) at ir.ideacenter.biz.service.NewsService$$FastClassByCGLIB$$fa1e5631.invoke(<generated>) at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:149) at org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:692) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:625) at ir.ideacenter.biz.service.NewsService$$EnhancerByCGLIB$$a9589857.addNewNews(<generated>) at ir.ideacenter.biz.service.fetcher.NewsFetchListener.contentLoaded(NewsFetchListener.java:65) at ir.ideacenter.biz.crawler.AbstractCrawler.fireEvents(AbstractCrawler.java:56) at ir.ideacenter.biz.crawler.MultiThreadedNewsCrawler$L2Task.run(MultiThreadedNewsCrawler.java:199) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619)
Khosro. ________________________________ From: Oleg Kalnichevski <ol...@apache.org> To: HttpClient User Discussion <httpclient-users@hc.apache.org> Sent: Wed, January 20, 2010 8:46:49 AM Subject: Re: HttpClient does not seem to correctly handle chunked response On Wed, 2010-01-20 at 08:20 -0800, Royan wrote: > We have an XML API service which splits reply XML data into chunks if it is > larger then certain amount of bytes. Here is the sample piece of the reply: > > HTTP/1.1 200 OK > Server: nginx/0.6.35 > Date: Wed, 20 Jan 2010 14:53:27 GMT > Content-Type: text/xml;charset=UTF-8 > Transfer-Encoding: chunked > Connection: keep-alive > X-Powered-By: Servlet 2.4; JBoss-4.2.2.GA (build: SVNTag=JBoss_4_2_2_GA > date=200710221139)/Tomcat-5.5 > Connection: close > > 1f0d > <?xml version='1.0' encoding='UTF-8'?> > <root> > [...] > <Label><![CDATA[Some character data br > 2000 > oken in the middle of the string]]></Label> > [...] > <root> > > 0 > > > The problem is when this XML is retrieved via > httpResponse.getEntity().getContent() I expect all chunks to be transformed > into single XML with no service information (I'm talking about some strange > 2000 number appearing in the middle of the string) > > In fact returned content is not always correctly parsed and contains such > service information, which in turn makes my XML parser throw an exception > > httpResponse.getEntity().isChunked() always retrieves "true" > > Can anyone advice on what am I doing wrong or otherwise provide information > how to workaround such issue? > > Thanks, > Roman I cannon recall a single confirmed problem with the correctness of the chunk coding code in HttpClient in the past 7 (seven) years I am a committer on the project. Double-check your code. If you are reasonably sure this is not an issue with your code, post a _COMPLETE_ wire / context log of the session and a test case reproducing the problem (preferably self-contained). Oleg --------------------------------------------------------------------- To unsubscribe, e-mail: mailto:httpclient-users-unsubscr...@hc.apache.org For additional commands, e-mail: mailto:httpclient-users-h...@hc.apache.org