Re: RFR [15] 8247696: Incorrect tail computation for large segments in AbstractMemorySegmentImpl::mismatch

Paul Sandoz Mon, 22 Jun 2020 10:56:24 -0700

+1

Paul.


> On Jun 22, 2020, at 10:36 AM, Chris Hegarty <chris.hega...@oracle.com> wrote:
> 
> Paul pointed out an issue, off list, where subsequent subranges
> could still result in a call to the intrinsic. That is now fixed in-place
> and the test amended to cover the scenario.
> 
> https://cr.openjdk.java.net/~chegar/8247696/webrev.01/ 
> <https://cr.openjdk.java.net/~chegar/8247696/webrev.01/>
> 
> -Chris.
> 
>> On 19 Jun 2020, at 11:56, Chris Hegarty <chris.hega...@oracle.com 
>> <mailto:chris.hega...@oracle.com>> wrote:
>> 
>> Paul, Maurizio,
>> 
>> This version incorporates all feedback so far.
>> 
>> https://cr.openjdk.java.net/~chegar/8247696/webrev.01/ 
>> <https://cr.openjdk.java.net/~chegar/8247696/webrev.01/>
>> Results on my machine:
>> 
>> Benchmark                          Mode  Cnt       Score        Error  Units
>> BulkOps.mismatch_large_bytebuffer  avgt   30   88266.728 ?   4083.476  ns/op
>> BulkOps.mismatch_large_segment     avgt   30   86141.343 ?   2450.450  ns/op
>> BulkOps.mismatch_small_bytebuffer  avgt   30       6.360 ?      0.425  ns/op
>> BulkOps.mismatch_small_segment     avgt   30       4.582 ?      1.040  ns/op
>> 
>> -Chris.
>> 
>>> On 19 Jun 2020, at 00:35, Paul Sandoz <paul.san...@oracle.com> wrote:
>>> 
>>> Thanks Chris.
>>> 
>>>> On Jun 18, 2020, at 2:57 AM, Maurizio Cimadamore 
>>>> <maurizio.cimadam...@oracle.com> wrote:
>>>> 
>>>> Thanks for looking at this Chris
>>>> 
>>>> On 17/06/2020 21:56, Paul Sandoz wrote:
>>>>> Hi Chris,
>>>>> 
>>>>> AbstractMemorySegmentImpl
>>>>> —
>>>>> 
>>>>> In vectorizedMismatchLarge:
>>>>> 
>>>>> 163             if (remaining > 7)
>>>>> 164                 throw new InternalError("remaining greater than 7: " 
>>>>> + remaining);
>>>>> 165             i = length - remaining;
>>>>> 166         }
>>>>> 
>>>>> Should this check be an assert?
>>>> I suggested that to Chris, since sometimes asserts are enabled when 
>>>> running tests, sometimes are not - in langtools we moved away from using 
>>>> asserts as we realized that in certain cases they were silently failing. 
>>>> I'm ok with whatever standard you feel comfortable with though.
>>> 
>>> Automated running of tests enable assertions (-ea -esa), perhaps the use is 
>>> more commonly accepted in library code than compiler code. I would favor so 
>>> in this case if necessary (sometimes they are avoided to reduce inline 
>>> thresholds).
>>> 
>>> 
>>>>> 
>>>>> —
>>>>> 
>>>>> This fix prompted me to think more deeply about the implementation of 
>>>>> vectorizedMismatchLarge and its use of vectorizedMismatch. Sorry, I 
>>>>> should have thought about this more throughly earlier on.
>>>>> 
>>>>> We need to refine the approach, not for this patch, but something to 
>>>>> follow up after. I think there are two issues.
>>>>> 
>>>>> 1) The intrinsic method vectorizedMismatch could potentially bomb out at 
>>>>> any point and return the bitwise compliment of the remaining number of 
>>>>> elements to check.
>>>>> 
>>>>> Obviously, there is no point doing that arbitrarily but a stub 
>>>>> implementation for, say, x86 AVX-512 might decide bomb out for  < 64 
>>>>> remaining elements, rather than apply vector operations on smaller vector 
>>>>> sizes or use a scalar loop. It does not today, but I think we should 
>>>>> guard against that potentially happening, otherwise bad things can happen.
>>>> 
>>>> So your worry here is that we'll end up with an infinite loop, right?
>>>> 
>>> 
>>> Or more likely that an incorrect result is returned since tail elements 
>>> will be skipped over as the offset and size is updated.
>>> 
>>> 
>>>> If so, we could check remaining against previous remaining and bomb out 
>>>> too if no further progress seem to be made?
>>>> 
>>> 
>>> I think it's better to always terminate the loop after the last sub-range 
>>> is checked, rather than unnecessarily calling twice.
>>> 
>>> I am not confident the vectorizedMismatch intrinsic stub has been tested 
>>> properly on very small lengths, since it's never directly called in such 
>>> cases, so also keeping "remaining > 7" is good too. 
>>> 
>>> Paul.
>>> 
>>>>> 
>>>>> I think the loop should exit when the last sub-range has been checked. We 
>>>>> should rely on other tests to ensure the intrinsic method is operating 
>>>>> efficiently.
>>>>> 
>>>>> 
>>>>> 2) This method only works when operating on byte arrays. It will not work 
>>>>> correctly if operating on short or long arrays, since we are not 
>>>>> adjusting the length and offsets accordingly by the scale. It's probably 
>>>>> easier to just rename this as vectorizedMismatchLargeForBytes and drop 
>>>>> the log2ArrayIndexScale argument. Then expand later if need be. I still 
>>>>> think the method rightly belongs in ArraysSupport.
>>>> 
>>>> Yep - probably good idea to restrict on bytes, for now.
>>>> 
>>>> Maurizio
>>>> 
>>>>> 
>>>>> Paul.
>>>>> 
>>>>>> On Jun 17, 2020, at 8:33 AM, Chris Hegarty <chris.hega...@oracle.com> 
>>>>>> wrote:
>>>>>> 
>>>>>> The MemorySegment::mismatch implementation added vectorized mismatch of 
>>>>>> long sizes. The implementation is trivial, but the starting point for a 
>>>>>> more optimized implementation, if needed. 
>>>>>> ArraysSupport::vectorizedMismatchLarge incorrectly returns the bitwise 
>>>>>> complement of the offset of the first mismatch, where it should return 
>>>>>> the bitwise complement of the number of remaining pairs of elements to 
>>>>>> be checked in the tail of the two arrays. The 
>>>>>> AbstractMemorySegmentImpl::mismatch masked this problem, since it 
>>>>>> seamlessly compared the remaining tail, which is larger than it should 
>>>>>> be.
>>>>>> 
>>>>>> Webrev:
>>>>>> https://cr.openjdk.java.net/~chegar/8247696/webrev.00/
>>>>>> 
>>>>>> I updated the exiting BulkOps micro-benchmark to cover mismatch. Here 
>>>>>> are the results, compared to ByteBuffer::mismatch, on my machine:
>>>>>> 
>>>>>> Benchmark                          Mode  Cnt        Score        Error  
>>>>>> Units
>>>>>> BulkOps.mismatch_large_bytebuffer  avgt   30   740186.973 ? 119314.207  
>>>>>> ns/op
>>>>>> BulkOps.mismatch_large_segment     avgt   30   683475.305 ?  76355.043  
>>>>>> ns/op
>>>>>> BulkOps.mismatch_small_bytebuffer  avgt   30        7.367 ?      0.523  
>>>>>> ns/op
>>>>>> BulkOps.mismatch_small_segment     avgt   30        4.140 ?      0.602  
>>>>>> ns/op
>>>>>> 
>>>>>> 
>>>>>> -Chris.
>> 
>

Re: RFR [15] 8247696: Incorrect tail computation for large segments in AbstractMemorySegmentImpl::mismatch

Reply via email to