"Filip Hanik - Dev Lists" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Bill Barker wrote: >> "Filip Hanik - Dev Lists" <[EMAIL PROTECTED]> wrote in message >> news:[EMAIL PROTECTED] >> >>> Bill Barker wrote: >>> >>>> "Filip Hanik - Dev Lists" <[EMAIL PROTECTED]> wrote in message >>>> news:[EMAIL PROTECTED] >>>> >>>> >>>>> Bill Barker wrote: >>>>> >>>>> >>>>>> "Filip Hanik - Dev Lists" <[EMAIL PROTECTED]> wrote in message >>>>>> news:[EMAIL PROTECTED] >>>>>> >>>>>> >>>>>> >>>>>>> Bill Barker wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> "Remy Maucherat" <[EMAIL PROTECTED]> wrote in message >>>>>>>> news:[EMAIL PROTECTED] >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Filip Hanik - Dev Lists wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Test Case and 5.5.x patch can be found here. >>>>>>>>>> http://people.apache.org/~fhanik/tomcat/b2c/ >>>>>>>>>> >>>>>>>>>> This is what is happening >>>>>>>>>> >>>>>>>>>> int cnt=conv.read( result, 0, BUFFER_SIZE ); >>>>>>>>>> is called with a "while (true)" statement, >>>>>>>>>> >>>>>>>>>> When the IntermediateInputStream.read returns -1, the above >>>>>>>>>> statement returns cnt==1. >>>>>>>>>> So to avoid calling conv.read, we must check to see if we have >>>>>>>>>> more bytes to read by implementing the available() method, to >>>>>>>>>> avoid the inputstream ever returning -1. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> It's possible, but I have a hard time understanding the issue. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> The issue is that InputStreamReader reads 8192 bytes from >>>>>>>> IntermediateInputStream on the first go. It then translates them >>>>>>>> into 2734 chars, but thinks that the last few bytes represent an >>>>>>>> incomplete char, so holds onto them. On the next call, >>>>>>>> IntermediateInputStream returns -1, so InputStreamReader outputs >>>>>>>> the last char as best it can (resulting in returning 1). Then the >>>>>>>> IntermediateInputStream buffer is reset, and it can continue on >>>>>>>> reading (but from the wrong position, resulting in corruption). >>>>>>>> >>>>>>>> Filip's patch is inelegant (better would be to use the ByteChunk >>>>>>>> sink), but other than my looking for a better way to do it, I can't >>>>>>>> come up with the required technical reason to porting the base of >>>>>>>> it to 5.5 (of course, I could care less what he does in his sandbox >>>>>>>> :). >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> I've committed the fix to 5.5, if you find a more elegant way of >>>>>>> solving the actual problem, feel free to revert it and commit >>>>>>> another fix. I don't care about the how, as long as there is a fix >>>>>>> that will be included in the tag 5.5.25 on Friday >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> No problem. I can see how to do this better, but I'll wait until the >>>>>> weekend to commit (since it's not totally trivial, I don't want a >>>>>> one-day window for regression testing :). That way 5.5.25 can go out >>>>>> with your patch. It doesn't include the NIO dependancy (which was my >>>>>> only concern), so it works well enough for me for now. >>>>>> >>>>>> >>>>>> >>>>> according to the KISS principle, your fix would have to be less than 4 >>>>> lines changed to be "more elegant" :) >>>>> >>>>> >>>>> >>>> Yes, it is more than 4 lines, but most of them are deletes :). I've >>>> done it already on my local machine here, in case anybody wants RTC on >>>> the 5.5.x branch (and Filip's test case passes with flying colors :). >>>> I'm pretty much sure that there are no regressions for 5.5.x+, but I >>>> still need to look at 3.3.x, and 4.1.x. >>>> >>>> If anyone is interested, I can post the patch files. Otherwise, I'll >>>> assume that CTR is still in place, and you can veto it when I commit >>>> over the w/e ;). Of course, if this message was meant as a pre-emptive >>>> veto, then I won't bother. >>>> >>>> >>> it's your choice if you want to commit it before or after the tag today. >>> If you wanna commit it before, then we are counting on your vote :) >>> >>> >> >> I've noticed a problem with using Reader.mark with multi-byte charsets >> (we have a hack in place that works for single-byte charsets). I could >> just commit what I've got here (which should be no worse than before :), >> but I'd like to solve this once and for all first. >> >> Using Filip's example servlet, if you modify it to do: >> reader = request.getReader(); >> + reader.mark(5); // content length + terminator >> while (true) { >> int c = reader.read(); >> if (c == -1 || c == '/') >> break; >> buf.append((char)c); >> } >> + reader.reset(); // throws IOException here >> >> With the current code (and what I have), the first call to reader.read >> requests 8192 chars, and produces 2734 chars. The current code then >> results in throwing away the last 2729 chars and abandoning the mark. >> The best I've got until now preserves the 2729 chars, but still throws >> away the mark, and hence still throws an IOE when reset is called. >> >> Long story short, I'm not now sure that I can promise to commit a fix >> this weekend :(. >> > > no worries, since UTF-8 can be anywhere between 1 and 6 bytes, wouldn't it > just be easier to do > boolean markSupported() { return false; } > and not worry about parsing the bytes correctly during a mark? > the problem you are describing goes beyond that though, so take your time. >
The problem would be that it would be a regression since probably 4.1.x :(. The patch I just committed is still a bit fragile (since it depends on InputBuffer knowing what CharChunk does internally), but since they are pretty stable classes, I think it should work. Filip's example webapp works like a charm :). > Filip --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]