Re: RFR [15] 8247696: Incorrect tail computation for large segments in AbstractMemorySegmentImpl::mismatch

Chris Hegarty Fri, 19 Jun 2020 03:57:15 -0700

Paul, Maurizio,

This version incorporates all feedback so far.


https://cr.openjdk.java.net/~chegar/8247696/webrev.01/
Results on my machine:

Benchmark                          Mode  Cnt       Score        Error  Units
BulkOps.mismatch_large_bytebuffer  avgt   30   88266.728 ?   4083.476  ns/op
BulkOps.mismatch_large_segment     avgt   30   86141.343 ?   2450.450  ns/op
BulkOps.mismatch_small_bytebuffer  avgt   30       6.360 ?      0.425  ns/op
BulkOps.mismatch_small_segment     avgt   30       4.582 ?      1.040  ns/op

-Chris.

> On 19 Jun 2020, at 00:35, Paul Sandoz <paul.san...@oracle.com> wrote:
> 
> Thanks Chris.
> 
>> On Jun 18, 2020, at 2:57 AM, Maurizio Cimadamore 
>> <maurizio.cimadam...@oracle.com> wrote:
>> 
>> Thanks for looking at this Chris
>> 
>> On 17/06/2020 21:56, Paul Sandoz wrote:
>>> Hi Chris,
>>> 
>>> AbstractMemorySegmentImpl
>>> —
>>> 
>>> In vectorizedMismatchLarge:
>>> 
>>> 163             if (remaining > 7)
>>> 164                 throw new InternalError("remaining greater than 7: " + 
>>> remaining);
>>> 165             i = length - remaining;
>>> 166         }
>>> 
>>> Should this check be an assert?
>> I suggested that to Chris, since sometimes asserts are enabled when running 
>> tests, sometimes are not - in langtools we moved away from using asserts as 
>> we realized that in certain cases they were silently failing. I'm ok with 
>> whatever standard you feel comfortable with though.
> 
> Automated running of tests enable assertions (-ea -esa), perhaps the use is 
> more commonly accepted in library code than compiler code. I would favor so 
> in this case if necessary (sometimes they are avoided to reduce inline 
> thresholds).
> 
> 
>>> 
>>> —
>>> 
>>> This fix prompted me to think more deeply about the implementation of 
>>> vectorizedMismatchLarge and its use of vectorizedMismatch. Sorry, I should 
>>> have thought about this more throughly earlier on.
>>> 
>>> We need to refine the approach, not for this patch, but something to follow 
>>> up after. I think there are two issues.
>>> 
>>> 1) The intrinsic method vectorizedMismatch could potentially bomb out at 
>>> any point and return the bitwise compliment of the remaining number of 
>>> elements to check.
>>> 
>>> Obviously, there is no point doing that arbitrarily but a stub 
>>> implementation for, say, x86 AVX-512 might decide bomb out for  < 64 
>>> remaining elements, rather than apply vector operations on smaller vector 
>>> sizes or use a scalar loop. It does not today, but I think we should guard 
>>> against that potentially happening, otherwise bad things can happen.
>> 
>> So your worry here is that we'll end up with an infinite loop, right?
>> 
> 
> Or more likely that an incorrect result is returned since tail elements will 
> be skipped over as the offset and size is updated.
> 
> 
>> If so, we could check remaining against previous remaining and bomb out too 
>> if no further progress seem to be made?
>> 
> 
> I think it's better to always terminate the loop after the last sub-range is 
> checked, rather than unnecessarily calling twice.
> 
> I am not confident the vectorizedMismatch intrinsic stub has been tested 
> properly on very small lengths, since it's never directly called in such 
> cases, so also keeping "remaining > 7" is good too. 
> 
> Paul.
> 
>>> 
>>> I think the loop should exit when the last sub-range has been checked. We 
>>> should rely on other tests to ensure the intrinsic method is operating 
>>> efficiently.
>>> 
>>> 
>>> 2) This method only works when operating on byte arrays. It will not work 
>>> correctly if operating on short or long arrays, since we are not adjusting 
>>> the length and offsets accordingly by the scale. It's probably easier to 
>>> just rename this as vectorizedMismatchLargeForBytes and drop the 
>>> log2ArrayIndexScale argument. Then expand later if need be. I still think 
>>> the method rightly belongs in ArraysSupport.
>> 
>> Yep - probably good idea to restrict on bytes, for now.
>> 
>> Maurizio
>> 
>>> 
>>> Paul.
>>> 
>>>> On Jun 17, 2020, at 8:33 AM, Chris Hegarty <chris.hega...@oracle.com> 
>>>> wrote:
>>>> 
>>>> The MemorySegment::mismatch implementation added vectorized mismatch of 
>>>> long sizes. The implementation is trivial, but the starting point for a 
>>>> more optimized implementation, if needed. 
>>>> ArraysSupport::vectorizedMismatchLarge incorrectly returns the bitwise 
>>>> complement of the offset of the first mismatch, where it should return the 
>>>> bitwise complement of the number of remaining pairs of elements to be 
>>>> checked in the tail of the two arrays. The 
>>>> AbstractMemorySegmentImpl::mismatch masked this problem, since it 
>>>> seamlessly compared the remaining tail, which is larger than it should be.
>>>> 
>>>> Webrev:
>>>> https://cr.openjdk.java.net/~chegar/8247696/webrev.00/
>>>> 
>>>> I updated the exiting BulkOps micro-benchmark to cover mismatch. Here are 
>>>> the results, compared to ByteBuffer::mismatch, on my machine:
>>>> 
>>>> Benchmark                          Mode  Cnt        Score        Error  
>>>> Units
>>>> BulkOps.mismatch_large_bytebuffer  avgt   30   740186.973 ? 119314.207  
>>>> ns/op
>>>> BulkOps.mismatch_large_segment     avgt   30   683475.305 ?  76355.043  
>>>> ns/op
>>>> BulkOps.mismatch_small_bytebuffer  avgt   30        7.367 ?      0.523  
>>>> ns/op
>>>> BulkOps.mismatch_small_segment     avgt   30        4.140 ?      0.602  
>>>> ns/op
>>>> 
>>>> 
>>>> -Chris.

Re: RFR [15] 8247696: Incorrect tail computation for large segments in AbstractMemorySegmentImpl::mismatch

Reply via email to