Paul, Maurizio, This version incorporates all feedback so far.
https://cr.openjdk.java.net/~chegar/8247696/webrev.01/ Results on my machine: Benchmark Mode Cnt Score Error Units BulkOps.mismatch_large_bytebuffer avgt 30 88266.728 ? 4083.476 ns/op BulkOps.mismatch_large_segment avgt 30 86141.343 ? 2450.450 ns/op BulkOps.mismatch_small_bytebuffer avgt 30 6.360 ? 0.425 ns/op BulkOps.mismatch_small_segment avgt 30 4.582 ? 1.040 ns/op -Chris. > On 19 Jun 2020, at 00:35, Paul Sandoz <paul.san...@oracle.com> wrote: > > Thanks Chris. > >> On Jun 18, 2020, at 2:57 AM, Maurizio Cimadamore >> <maurizio.cimadam...@oracle.com> wrote: >> >> Thanks for looking at this Chris >> >> On 17/06/2020 21:56, Paul Sandoz wrote: >>> Hi Chris, >>> >>> AbstractMemorySegmentImpl >>> — >>> >>> In vectorizedMismatchLarge: >>> >>> 163 if (remaining > 7) >>> 164 throw new InternalError("remaining greater than 7: " + >>> remaining); >>> 165 i = length - remaining; >>> 166 } >>> >>> Should this check be an assert? >> I suggested that to Chris, since sometimes asserts are enabled when running >> tests, sometimes are not - in langtools we moved away from using asserts as >> we realized that in certain cases they were silently failing. I'm ok with >> whatever standard you feel comfortable with though. > > Automated running of tests enable assertions (-ea -esa), perhaps the use is > more commonly accepted in library code than compiler code. I would favor so > in this case if necessary (sometimes they are avoided to reduce inline > thresholds). > > >>> >>> — >>> >>> This fix prompted me to think more deeply about the implementation of >>> vectorizedMismatchLarge and its use of vectorizedMismatch. Sorry, I should >>> have thought about this more throughly earlier on. >>> >>> We need to refine the approach, not for this patch, but something to follow >>> up after. I think there are two issues. >>> >>> 1) The intrinsic method vectorizedMismatch could potentially bomb out at >>> any point and return the bitwise compliment of the remaining number of >>> elements to check. >>> >>> Obviously, there is no point doing that arbitrarily but a stub >>> implementation for, say, x86 AVX-512 might decide bomb out for < 64 >>> remaining elements, rather than apply vector operations on smaller vector >>> sizes or use a scalar loop. It does not today, but I think we should guard >>> against that potentially happening, otherwise bad things can happen. >> >> So your worry here is that we'll end up with an infinite loop, right? >> > > Or more likely that an incorrect result is returned since tail elements will > be skipped over as the offset and size is updated. > > >> If so, we could check remaining against previous remaining and bomb out too >> if no further progress seem to be made? >> > > I think it's better to always terminate the loop after the last sub-range is > checked, rather than unnecessarily calling twice. > > I am not confident the vectorizedMismatch intrinsic stub has been tested > properly on very small lengths, since it's never directly called in such > cases, so also keeping "remaining > 7" is good too. > > Paul. > >>> >>> I think the loop should exit when the last sub-range has been checked. We >>> should rely on other tests to ensure the intrinsic method is operating >>> efficiently. >>> >>> >>> 2) This method only works when operating on byte arrays. It will not work >>> correctly if operating on short or long arrays, since we are not adjusting >>> the length and offsets accordingly by the scale. It's probably easier to >>> just rename this as vectorizedMismatchLargeForBytes and drop the >>> log2ArrayIndexScale argument. Then expand later if need be. I still think >>> the method rightly belongs in ArraysSupport. >> >> Yep - probably good idea to restrict on bytes, for now. >> >> Maurizio >> >>> >>> Paul. >>> >>>> On Jun 17, 2020, at 8:33 AM, Chris Hegarty <chris.hega...@oracle.com> >>>> wrote: >>>> >>>> The MemorySegment::mismatch implementation added vectorized mismatch of >>>> long sizes. The implementation is trivial, but the starting point for a >>>> more optimized implementation, if needed. >>>> ArraysSupport::vectorizedMismatchLarge incorrectly returns the bitwise >>>> complement of the offset of the first mismatch, where it should return the >>>> bitwise complement of the number of remaining pairs of elements to be >>>> checked in the tail of the two arrays. The >>>> AbstractMemorySegmentImpl::mismatch masked this problem, since it >>>> seamlessly compared the remaining tail, which is larger than it should be. >>>> >>>> Webrev: >>>> https://cr.openjdk.java.net/~chegar/8247696/webrev.00/ >>>> >>>> I updated the exiting BulkOps micro-benchmark to cover mismatch. Here are >>>> the results, compared to ByteBuffer::mismatch, on my machine: >>>> >>>> Benchmark Mode Cnt Score Error >>>> Units >>>> BulkOps.mismatch_large_bytebuffer avgt 30 740186.973 ? 119314.207 >>>> ns/op >>>> BulkOps.mismatch_large_segment avgt 30 683475.305 ? 76355.043 >>>> ns/op >>>> BulkOps.mismatch_small_bytebuffer avgt 30 7.367 ? 0.523 >>>> ns/op >>>> BulkOps.mismatch_small_segment avgt 30 4.140 ? 0.602 >>>> ns/op >>>> >>>> >>>> -Chris.