Paul pointed out an issue, off list, where subsequent subranges
could still result in a call to the intrinsic. That is now fixed in-place
and the test amended to cover the scenario.

https://cr.openjdk.java.net/~chegar/8247696/webrev.01/

-Chris.

> On 19 Jun 2020, at 11:56, Chris Hegarty <chris.hega...@oracle.com> wrote:
> 
> Paul, Maurizio,
> 
> This version incorporates all feedback so far.
> 
> https://cr.openjdk.java.net/~chegar/8247696/webrev.01/
> Results on my machine:
> 
> Benchmark                          Mode  Cnt       Score        Error  Units
> BulkOps.mismatch_large_bytebuffer  avgt   30   88266.728 ?   4083.476  ns/op
> BulkOps.mismatch_large_segment     avgt   30   86141.343 ?   2450.450  ns/op
> BulkOps.mismatch_small_bytebuffer  avgt   30       6.360 ?      0.425  ns/op
> BulkOps.mismatch_small_segment     avgt   30       4.582 ?      1.040  ns/op
> 
> -Chris.
> 
>> On 19 Jun 2020, at 00:35, Paul Sandoz <paul.san...@oracle.com> wrote:
>> 
>> Thanks Chris.
>> 
>>> On Jun 18, 2020, at 2:57 AM, Maurizio Cimadamore 
>>> <maurizio.cimadam...@oracle.com> wrote:
>>> 
>>> Thanks for looking at this Chris
>>> 
>>> On 17/06/2020 21:56, Paul Sandoz wrote:
>>>> Hi Chris,
>>>> 
>>>> AbstractMemorySegmentImpl
>>>> —
>>>> 
>>>> In vectorizedMismatchLarge:
>>>> 
>>>> 163             if (remaining > 7)
>>>> 164                 throw new InternalError("remaining greater than 7: " + 
>>>> remaining);
>>>> 165             i = length - remaining;
>>>> 166         }
>>>> 
>>>> Should this check be an assert?
>>> I suggested that to Chris, since sometimes asserts are enabled when running 
>>> tests, sometimes are not - in langtools we moved away from using asserts as 
>>> we realized that in certain cases they were silently failing. I'm ok with 
>>> whatever standard you feel comfortable with though.
>> 
>> Automated running of tests enable assertions (-ea -esa), perhaps the use is 
>> more commonly accepted in library code than compiler code. I would favor so 
>> in this case if necessary (sometimes they are avoided to reduce inline 
>> thresholds).
>> 
>> 
>>>> 
>>>> —
>>>> 
>>>> This fix prompted me to think more deeply about the implementation of 
>>>> vectorizedMismatchLarge and its use of vectorizedMismatch. Sorry, I should 
>>>> have thought about this more throughly earlier on.
>>>> 
>>>> We need to refine the approach, not for this patch, but something to 
>>>> follow up after. I think there are two issues.
>>>> 
>>>> 1) The intrinsic method vectorizedMismatch could potentially bomb out at 
>>>> any point and return the bitwise compliment of the remaining number of 
>>>> elements to check.
>>>> 
>>>> Obviously, there is no point doing that arbitrarily but a stub 
>>>> implementation for, say, x86 AVX-512 might decide bomb out for  < 64 
>>>> remaining elements, rather than apply vector operations on smaller vector 
>>>> sizes or use a scalar loop. It does not today, but I think we should guard 
>>>> against that potentially happening, otherwise bad things can happen.
>>> 
>>> So your worry here is that we'll end up with an infinite loop, right?
>>> 
>> 
>> Or more likely that an incorrect result is returned since tail elements will 
>> be skipped over as the offset and size is updated.
>> 
>> 
>>> If so, we could check remaining against previous remaining and bomb out too 
>>> if no further progress seem to be made?
>>> 
>> 
>> I think it's better to always terminate the loop after the last sub-range is 
>> checked, rather than unnecessarily calling twice.
>> 
>> I am not confident the vectorizedMismatch intrinsic stub has been tested 
>> properly on very small lengths, since it's never directly called in such 
>> cases, so also keeping "remaining > 7" is good too. 
>> 
>> Paul.
>> 
>>>> 
>>>> I think the loop should exit when the last sub-range has been checked. We 
>>>> should rely on other tests to ensure the intrinsic method is operating 
>>>> efficiently.
>>>> 
>>>> 
>>>> 2) This method only works when operating on byte arrays. It will not work 
>>>> correctly if operating on short or long arrays, since we are not adjusting 
>>>> the length and offsets accordingly by the scale. It's probably easier to 
>>>> just rename this as vectorizedMismatchLargeForBytes and drop the 
>>>> log2ArrayIndexScale argument. Then expand later if need be. I still think 
>>>> the method rightly belongs in ArraysSupport.
>>> 
>>> Yep - probably good idea to restrict on bytes, for now.
>>> 
>>> Maurizio
>>> 
>>>> 
>>>> Paul.
>>>> 
>>>>> On Jun 17, 2020, at 8:33 AM, Chris Hegarty <chris.hega...@oracle.com> 
>>>>> wrote:
>>>>> 
>>>>> The MemorySegment::mismatch implementation added vectorized mismatch of 
>>>>> long sizes. The implementation is trivial, but the starting point for a 
>>>>> more optimized implementation, if needed. 
>>>>> ArraysSupport::vectorizedMismatchLarge incorrectly returns the bitwise 
>>>>> complement of the offset of the first mismatch, where it should return 
>>>>> the bitwise complement of the number of remaining pairs of elements to be 
>>>>> checked in the tail of the two arrays. The 
>>>>> AbstractMemorySegmentImpl::mismatch masked this problem, since it 
>>>>> seamlessly compared the remaining tail, which is larger than it should be.
>>>>> 
>>>>> Webrev:
>>>>> https://cr.openjdk.java.net/~chegar/8247696/webrev.00/
>>>>> 
>>>>> I updated the exiting BulkOps micro-benchmark to cover mismatch. Here are 
>>>>> the results, compared to ByteBuffer::mismatch, on my machine:
>>>>> 
>>>>> Benchmark                          Mode  Cnt        Score        Error  
>>>>> Units
>>>>> BulkOps.mismatch_large_bytebuffer  avgt   30   740186.973 ? 119314.207  
>>>>> ns/op
>>>>> BulkOps.mismatch_large_segment     avgt   30   683475.305 ?  76355.043  
>>>>> ns/op
>>>>> BulkOps.mismatch_small_bytebuffer  avgt   30        7.367 ?      0.523  
>>>>> ns/op
>>>>> BulkOps.mismatch_small_segment     avgt   30        4.140 ?      0.602  
>>>>> ns/op
>>>>> 
>>>>> 
>>>>> -Chris.
> 

Reply via email to