Hi Eric,
Thank you so much for your bug report. I'm forwarding this message to
our official mailing list for better and faster assistance. :)
HTH,
Trustin
On Jan 4, 2008 1:18 PM, Eric Gaumer <[EMAIL PROTECTED]> wrote:
> Hey Trustin,
>
> I'm building an enterprise RSS retriever for submitting RSS feeds to a
> search engine (leveraging meta data found in the channel).
>
> I was working in Python with Twisted but when I came across mina I decided
> to port things over to Java. You've created a nice framework for
> asynchronous sockets.
>
> I'm using a 2.0 snapshot from about a week ago.
>
> The protocol-http-client is geared more for connecting to a single site and
> retrieving multiple pages. For the RSS fetcher, I need to connect to many
> sites and fetch a single page.
>
> I removed the blocking call to connect and added a listener, etc... I am
> using the filter-codec-http package because most of what's being done here
> is over my head right now and I see no reason to re-implement it.
>
> Everything is running great. I'm able to fetch and parse 500 RSS channels in
> about 41s and 4,175 RSS channels in 5m 59s. This is quite good considering
> there is about 1s average latency just in a typical request/response.
>
> The one problem I have noticed is that I would get a fair amount (maybe 10%)
> of IllegalArgumentExceptions. Initially I just removed the offending feeds
> from my tests.
>
> Tonight I had a chance to return to these problematic feeds and investigate
> further. After some tests, I found that they all used Chunked Transfer
> Coding.
>
> Looking at the raw response and reviewing the code in
> src/main/java/org/apache/mina/filter/codec/http/ChunkedBodyDecodingState.jav
> a I found that the error was generated because of a space between the chunk
> size and the CR/LF bytes.
>
> For instance:
>
> 8a·(CR)(LF)
>
> Rather than:
>
> 8a(CR)(LF)
>
> I've browsed the RFC but I won't claim to fully understand it all. Should
> whitespace be handled or should an exception be thrown here?
>
> Here is an example with some debug output:
>
> Byte:13
>
> LENGTH:eb1
>
> Byte:13
>
> LENGTH:f48
>
> Byte:13
>
> LENGTH:15b
>
> Byte:13
>
> LENGTH:d77
>
> Byte:13
>
> LENGTH:b81
>
> Byte:13
>
> LENGTH:f39
>
> Byte:13
>
> LENGTH:d0
>
> Byte:13
>
> LENGTH:53b
>
> Byte:13
>
> LENGTH:3f30
>
> Byte:13
>
> LENGTH:3a1
>
> Byte:13
>
> LENGTH:e4c
>
> Byte:32
>
> Illegal Argument Here: 32
>
> Callback Exception: org.apache.mina.filter.codec.ProtocolDecoderException:
> java.lang.IllegalArgumentException
>
> So you see that we read "8", "a", and " " which causes an exception inside
> isTerminator()
>
> This happens on a fair amount of sites so I'm assuming it's valid to have
> whitespace here? The RFC provides an EBNF style language but doesn't
> explicitly mention anything about allowing whitespace here. Yet, some
> servers seem to occasionally add this whitespace.
>
> At any rate here is a simple patch that I did which seems to make these
> errors disappear.
>
>
> --- ChunkedBodyDecodingState.orig.java 2008-01-03 22:48:14.000000000 -0500
> +++ ChunkedBodyDecodingState.java 2008-01-03 22:49:34.000000000 -0500
> @@ -94,7 +94,7 @@
> .throwDecoderException("Expected a chunk length.");
> }
>
> - String length = product.getString(asciiDecoder);
> + String length = product.getString(asciiDecoder).trim();
> lastChunkLength = Integer.parseInt(length, 16);
> if (chunkHasExtension) {
> return SKIP_CHUNK_EXTENSION;
> @@ -106,7 +106,7 @@
> @Override
> protected boolean isTerminator(byte b) {
> if (!(b >= '0' && b <= '9' || b >= 'a' && b <= 'f' || b >= 'A'
> - && b <= 'F')) {
> + && b <= 'F' || b == ' ')) {
> if (b == '\r' || b == ';') {
> chunkHasExtension = b == ';';
> return true;
>
>
> Maybe this is helpful to you. Thanks again for a great framework. I expect
> to get much use out of it ;-)
>
> -Eric
>
>
>
--
what we call human nature is actually human habit
--
http://gleamynode.net/
--
PGP Key ID: 0x0255ECA6