+1 Paul.
> On Jun 22, 2020, at 10:36 AM, Chris Hegarty <chris.hega...@oracle.com> wrote: > > Paul pointed out an issue, off list, where subsequent subranges > could still result in a call to the intrinsic. That is now fixed in-place > and the test amended to cover the scenario. > > https://cr.openjdk.java.net/~chegar/8247696/webrev.01/ > <https://cr.openjdk.java.net/~chegar/8247696/webrev.01/> > > -Chris. > >> On 19 Jun 2020, at 11:56, Chris Hegarty <chris.hega...@oracle.com >> <mailto:chris.hega...@oracle.com>> wrote: >> >> Paul, Maurizio, >> >> This version incorporates all feedback so far. >> >> https://cr.openjdk.java.net/~chegar/8247696/webrev.01/ >> <https://cr.openjdk.java.net/~chegar/8247696/webrev.01/> >> Results on my machine: >> >> Benchmark Mode Cnt Score Error Units >> BulkOps.mismatch_large_bytebuffer avgt 30 88266.728 ? 4083.476 ns/op >> BulkOps.mismatch_large_segment avgt 30 86141.343 ? 2450.450 ns/op >> BulkOps.mismatch_small_bytebuffer avgt 30 6.360 ? 0.425 ns/op >> BulkOps.mismatch_small_segment avgt 30 4.582 ? 1.040 ns/op >> >> -Chris. >> >>> On 19 Jun 2020, at 00:35, Paul Sandoz <paul.san...@oracle.com> wrote: >>> >>> Thanks Chris. >>> >>>> On Jun 18, 2020, at 2:57 AM, Maurizio Cimadamore >>>> <maurizio.cimadam...@oracle.com> wrote: >>>> >>>> Thanks for looking at this Chris >>>> >>>> On 17/06/2020 21:56, Paul Sandoz wrote: >>>>> Hi Chris, >>>>> >>>>> AbstractMemorySegmentImpl >>>>> — >>>>> >>>>> In vectorizedMismatchLarge: >>>>> >>>>> 163 if (remaining > 7) >>>>> 164 throw new InternalError("remaining greater than 7: " >>>>> + remaining); >>>>> 165 i = length - remaining; >>>>> 166 } >>>>> >>>>> Should this check be an assert? >>>> I suggested that to Chris, since sometimes asserts are enabled when >>>> running tests, sometimes are not - in langtools we moved away from using >>>> asserts as we realized that in certain cases they were silently failing. >>>> I'm ok with whatever standard you feel comfortable with though. >>> >>> Automated running of tests enable assertions (-ea -esa), perhaps the use is >>> more commonly accepted in library code than compiler code. I would favor so >>> in this case if necessary (sometimes they are avoided to reduce inline >>> thresholds). >>> >>> >>>>> >>>>> — >>>>> >>>>> This fix prompted me to think more deeply about the implementation of >>>>> vectorizedMismatchLarge and its use of vectorizedMismatch. Sorry, I >>>>> should have thought about this more throughly earlier on. >>>>> >>>>> We need to refine the approach, not for this patch, but something to >>>>> follow up after. I think there are two issues. >>>>> >>>>> 1) The intrinsic method vectorizedMismatch could potentially bomb out at >>>>> any point and return the bitwise compliment of the remaining number of >>>>> elements to check. >>>>> >>>>> Obviously, there is no point doing that arbitrarily but a stub >>>>> implementation for, say, x86 AVX-512 might decide bomb out for < 64 >>>>> remaining elements, rather than apply vector operations on smaller vector >>>>> sizes or use a scalar loop. It does not today, but I think we should >>>>> guard against that potentially happening, otherwise bad things can happen. >>>> >>>> So your worry here is that we'll end up with an infinite loop, right? >>>> >>> >>> Or more likely that an incorrect result is returned since tail elements >>> will be skipped over as the offset and size is updated. >>> >>> >>>> If so, we could check remaining against previous remaining and bomb out >>>> too if no further progress seem to be made? >>>> >>> >>> I think it's better to always terminate the loop after the last sub-range >>> is checked, rather than unnecessarily calling twice. >>> >>> I am not confident the vectorizedMismatch intrinsic stub has been tested >>> properly on very small lengths, since it's never directly called in such >>> cases, so also keeping "remaining > 7" is good too. >>> >>> Paul. >>> >>>>> >>>>> I think the loop should exit when the last sub-range has been checked. We >>>>> should rely on other tests to ensure the intrinsic method is operating >>>>> efficiently. >>>>> >>>>> >>>>> 2) This method only works when operating on byte arrays. It will not work >>>>> correctly if operating on short or long arrays, since we are not >>>>> adjusting the length and offsets accordingly by the scale. It's probably >>>>> easier to just rename this as vectorizedMismatchLargeForBytes and drop >>>>> the log2ArrayIndexScale argument. Then expand later if need be. I still >>>>> think the method rightly belongs in ArraysSupport. >>>> >>>> Yep - probably good idea to restrict on bytes, for now. >>>> >>>> Maurizio >>>> >>>>> >>>>> Paul. >>>>> >>>>>> On Jun 17, 2020, at 8:33 AM, Chris Hegarty <chris.hega...@oracle.com> >>>>>> wrote: >>>>>> >>>>>> The MemorySegment::mismatch implementation added vectorized mismatch of >>>>>> long sizes. The implementation is trivial, but the starting point for a >>>>>> more optimized implementation, if needed. >>>>>> ArraysSupport::vectorizedMismatchLarge incorrectly returns the bitwise >>>>>> complement of the offset of the first mismatch, where it should return >>>>>> the bitwise complement of the number of remaining pairs of elements to >>>>>> be checked in the tail of the two arrays. The >>>>>> AbstractMemorySegmentImpl::mismatch masked this problem, since it >>>>>> seamlessly compared the remaining tail, which is larger than it should >>>>>> be. >>>>>> >>>>>> Webrev: >>>>>> https://cr.openjdk.java.net/~chegar/8247696/webrev.00/ >>>>>> >>>>>> I updated the exiting BulkOps micro-benchmark to cover mismatch. Here >>>>>> are the results, compared to ByteBuffer::mismatch, on my machine: >>>>>> >>>>>> Benchmark Mode Cnt Score Error >>>>>> Units >>>>>> BulkOps.mismatch_large_bytebuffer avgt 30 740186.973 ? 119314.207 >>>>>> ns/op >>>>>> BulkOps.mismatch_large_segment avgt 30 683475.305 ? 76355.043 >>>>>> ns/op >>>>>> BulkOps.mismatch_small_bytebuffer avgt 30 7.367 ? 0.523 >>>>>> ns/op >>>>>> BulkOps.mismatch_small_segment avgt 30 4.140 ? 0.602 >>>>>> ns/op >>>>>> >>>>>> >>>>>> -Chris. >> >