"Filip Hanik - Dev Lists" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> Bill Barker wrote:
>> "Filip Hanik - Dev Lists" <[EMAIL PROTECTED]> wrote in message 
>> news:[EMAIL PROTECTED]
>>
>>> Bill Barker wrote:
>>>
>>>> "Filip Hanik - Dev Lists" <[EMAIL PROTECTED]> wrote in message 
>>>> news:[EMAIL PROTECTED]
>>>>
>>>>
>>>>> Bill Barker wrote:
>>>>>
>>>>>
>>>>>> "Filip Hanik - Dev Lists" <[EMAIL PROTECTED]> wrote in message 
>>>>>> news:[EMAIL PROTECTED]
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Bill Barker wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> "Remy Maucherat" <[EMAIL PROTECTED]> wrote in message 
>>>>>>>> news:[EMAIL PROTECTED]
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Filip Hanik - Dev Lists wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Test Case and 5.5.x patch can be found here.
>>>>>>>>>> http://people.apache.org/~fhanik/tomcat/b2c/
>>>>>>>>>>
>>>>>>>>>> This is what is happening
>>>>>>>>>>
>>>>>>>>>> int cnt=conv.read( result, 0, BUFFER_SIZE );
>>>>>>>>>> is called with a "while (true)" statement,
>>>>>>>>>>
>>>>>>>>>> When the IntermediateInputStream.read returns -1, the above 
>>>>>>>>>> statement returns cnt==1.
>>>>>>>>>> So to avoid calling conv.read, we must check to see if we have 
>>>>>>>>>> more bytes to read by implementing the available() method, to 
>>>>>>>>>> avoid the inputstream ever returning -1.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> It's possible, but I have a hard time understanding the issue.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> The issue is that InputStreamReader reads 8192 bytes from 
>>>>>>>> IntermediateInputStream on the first go.  It then translates them 
>>>>>>>> into 2734 chars, but thinks that the last few bytes represent an 
>>>>>>>> incomplete char, so holds onto them.  On the next call, 
>>>>>>>> IntermediateInputStream returns -1, so InputStreamReader outputs 
>>>>>>>> the last char as best it can (resulting in returning 1).  Then the 
>>>>>>>> IntermediateInputStream buffer is reset, and it can continue on 
>>>>>>>> reading (but from the wrong position, resulting in corruption).
>>>>>>>>
>>>>>>>> Filip's patch is inelegant (better would be to use the ByteChunk 
>>>>>>>> sink), but other than my looking for a better way to do it, I can't 
>>>>>>>> come up with the required technical reason to porting the base of 
>>>>>>>> it to 5.5 (of course, I could care less what he does in his sandbox 
>>>>>>>> :).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> I've committed the fix to 5.5, if you find a more elegant way of 
>>>>>>> solving the actual problem, feel free to revert it and commit 
>>>>>>> another fix. I don't care about the how, as long as there is a fix 
>>>>>>> that will be included in the tag 5.5.25 on Friday
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> No problem.  I can see how to do this better, but I'll wait until the 
>>>>>> weekend to commit (since it's not totally trivial, I don't want a 
>>>>>> one-day window for regression testing :).  That way 5.5.25 can go out 
>>>>>> with your patch.  It doesn't include the NIO dependancy (which was my 
>>>>>> only concern), so it works well enough for me for now.
>>>>>>
>>>>>>
>>>>>>
>>>>> according to the KISS principle, your fix would have to be less than 4 
>>>>> lines changed to be "more elegant" :)
>>>>>
>>>>>
>>>>>
>>>> Yes, it is more than 4 lines, but most of them are deletes :).  I've 
>>>> done it already on my local machine here, in case anybody wants RTC on 
>>>> the 5.5.x branch (and Filip's test case passes with flying colors :). 
>>>> I'm pretty much sure that there are no regressions for 5.5.x+, but I 
>>>> still need to look at 3.3.x,  and 4.1.x.
>>>>
>>>> If anyone is interested, I can post the patch files.  Otherwise, I'll 
>>>> assume that CTR is still in place, and you can veto it when I commit 
>>>> over the w/e ;).  Of course, if this message was meant as a pre-emptive 
>>>> veto, then I won't bother.
>>>>
>>>>
>>> it's your choice if you want to commit it before or after the tag today.
>>> If you wanna commit it before, then we are counting on your vote :)
>>>
>>>
>>
>> I've noticed a problem with using Reader.mark with multi-byte charsets 
>> (we have a hack in place that works for single-byte charsets).  I could 
>> just commit what I've got here (which should be no worse than before :), 
>> but I'd like to solve this once and for all first.
>>
>> Using Filip's example servlet, if you modify it to do:
>>             reader = request.getReader();
>> +            reader.mark(5);   // content length + terminator
>>             while (true) {
>>                 int c = reader.read();
>>                 if (c == -1 || c == '/')
>>                     break;
>>                 buf.append((char)c);
>>             }
>> +           reader.reset(); // throws IOException here
>>
>> With the current code (and what I have), the first call to reader.read 
>> requests 8192 chars, and produces 2734 chars.  The current code then 
>> results in throwing away the last 2729 chars and abandoning the mark. 
>> The best I've got until now preserves the 2729 chars, but still throws 
>> away the mark, and hence still throws an IOE when reset is called.
>>
>> Long story short, I'm not now sure that I can promise to commit a fix 
>> this weekend :(.
>>
>
> no worries, since UTF-8 can be anywhere between 1 and 6 bytes, wouldn't it 
> just be easier to do
> boolean markSupported() { return false; }
> and not worry about parsing the bytes correctly during a mark?
> the problem you are describing goes beyond that though, so take your time.
>

The problem would be that it would be a regression since probably 4.1.x :(. 
The patch I just committed is still a bit fragile (since it depends on 
InputBuffer knowing what CharChunk does internally), but since they are 
pretty stable classes, I think it should work.  Filip's example webapp works 
like a charm :).

> Filip 




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to