Bill Barker wrote:
"Filip Hanik - Dev Lists" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
Bill Barker wrote:
"Filip Hanik - Dev Lists" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
Bill Barker wrote:
"Filip Hanik - Dev Lists" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
Bill Barker wrote:
"Remy Maucherat" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
Filip Hanik - Dev Lists wrote:
Test Case and 5.5.x patch can be found here.
http://people.apache.org/~fhanik/tomcat/b2c/
This is what is happening
int cnt=conv.read( result, 0, BUFFER_SIZE );
is called with a "while (true)" statement,
When the IntermediateInputStream.read returns -1, the above
statement returns cnt==1.
So to avoid calling conv.read, we must check to see if we have more
bytes to read by implementing the available() method, to avoid the
inputstream ever returning -1.
It's possible, but I have a hard time understanding the issue.
The issue is that InputStreamReader reads 8192 bytes from
IntermediateInputStream on the first go. It then translates them
into 2734 chars, but thinks that the last few bytes represent an
incomplete char, so holds onto them. On the next call,
IntermediateInputStream returns -1, so InputStreamReader outputs the
last char as best it can (resulting in returning 1). Then the
IntermediateInputStream buffer is reset, and it can continue on
reading (but from the wrong position, resulting in corruption).
Filip's patch is inelegant (better would be to use the ByteChunk
sink), but other than my looking for a better way to do it, I can't
come up with the required technical reason to porting the base of it
to 5.5 (of course, I could care less what he does in his sandbox :).
I've committed the fix to 5.5, if you find a more elegant way of
solving the actual problem, feel free to revert it and commit another
fix. I don't care about the how, as long as there is a fix that will
be included in the tag 5.5.25 on Friday
No problem. I can see how to do this better, but I'll wait until the
weekend to commit (since it's not totally trivial, I don't want a
one-day window for regression testing :). That way 5.5.25 can go out
with your patch. It doesn't include the NIO dependancy (which was my
only concern), so it works well enough for me for now.
according to the KISS principle, your fix would have to be less than 4
lines changed to be "more elegant" :)
Yes, it is more than 4 lines, but most of them are deletes :). I've done
it already on my local machine here, in case anybody wants RTC on the
5.5.x branch (and Filip's test case passes with flying colors :). I'm
pretty much sure that there are no regressions for 5.5.x+, but I still
need to look at 3.3.x, and 4.1.x.
If anyone is interested, I can post the patch files. Otherwise, I'll
assume that CTR is still in place, and you can veto it when I commit over
the w/e ;). Of course, if this message was meant as a pre-emptive veto,
then I won't bother.
it's your choice if you want to commit it before or after the tag today.
If you wanna commit it before, then we are counting on your vote :)
I've noticed a problem with using Reader.mark with multi-byte charsets (we
have a hack in place that works for single-byte charsets). I could just
commit what I've got here (which should be no worse than before :), but I'd
like to solve this once and for all first.
Using Filip's example servlet, if you modify it to do:
reader = request.getReader();
+ reader.mark(5); // content length + terminator
while (true) {
int c = reader.read();
if (c == -1 || c == '/')
break;
buf.append((char)c);
}
+ reader.reset(); // throws IOException here
With the current code (and what I have), the first call to reader.read
requests 8192 chars, and produces 2734 chars. The current code then results
in throwing away the last 2729 chars and abandoning the mark. The best I've
got until now preserves the 2729 chars, but still throws away the mark, and
hence still throws an IOE when reset is called.
Long story short, I'm not now sure that I can promise to commit a fix this
weekend :(.
no worries, since UTF-8 can be anywhere between 1 and 6 bytes, wouldn't
it just be easier to do
boolean markSupported() { return false; }
and not worry about parsing the bytes correctly during a mark?
the problem you are describing goes beyond that though, so take your time.
Filip
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]