Paul pointed out an issue, off list, where subsequent subranges could still result in a call to the intrinsic. That is now fixed in-place and the test amended to cover the scenario.
https://cr.openjdk.java.net/~chegar/8247696/webrev.01/ -Chris. > On 19 Jun 2020, at 11:56, Chris Hegarty <chris.hega...@oracle.com> wrote: > > Paul, Maurizio, > > This version incorporates all feedback so far. > > https://cr.openjdk.java.net/~chegar/8247696/webrev.01/ > Results on my machine: > > Benchmark Mode Cnt Score Error Units > BulkOps.mismatch_large_bytebuffer avgt 30 88266.728 ? 4083.476 ns/op > BulkOps.mismatch_large_segment avgt 30 86141.343 ? 2450.450 ns/op > BulkOps.mismatch_small_bytebuffer avgt 30 6.360 ? 0.425 ns/op > BulkOps.mismatch_small_segment avgt 30 4.582 ? 1.040 ns/op > > -Chris. > >> On 19 Jun 2020, at 00:35, Paul Sandoz <paul.san...@oracle.com> wrote: >> >> Thanks Chris. >> >>> On Jun 18, 2020, at 2:57 AM, Maurizio Cimadamore >>> <maurizio.cimadam...@oracle.com> wrote: >>> >>> Thanks for looking at this Chris >>> >>> On 17/06/2020 21:56, Paul Sandoz wrote: >>>> Hi Chris, >>>> >>>> AbstractMemorySegmentImpl >>>> — >>>> >>>> In vectorizedMismatchLarge: >>>> >>>> 163 if (remaining > 7) >>>> 164 throw new InternalError("remaining greater than 7: " + >>>> remaining); >>>> 165 i = length - remaining; >>>> 166 } >>>> >>>> Should this check be an assert? >>> I suggested that to Chris, since sometimes asserts are enabled when running >>> tests, sometimes are not - in langtools we moved away from using asserts as >>> we realized that in certain cases they were silently failing. I'm ok with >>> whatever standard you feel comfortable with though. >> >> Automated running of tests enable assertions (-ea -esa), perhaps the use is >> more commonly accepted in library code than compiler code. I would favor so >> in this case if necessary (sometimes they are avoided to reduce inline >> thresholds). >> >> >>>> >>>> — >>>> >>>> This fix prompted me to think more deeply about the implementation of >>>> vectorizedMismatchLarge and its use of vectorizedMismatch. Sorry, I should >>>> have thought about this more throughly earlier on. >>>> >>>> We need to refine the approach, not for this patch, but something to >>>> follow up after. I think there are two issues. >>>> >>>> 1) The intrinsic method vectorizedMismatch could potentially bomb out at >>>> any point and return the bitwise compliment of the remaining number of >>>> elements to check. >>>> >>>> Obviously, there is no point doing that arbitrarily but a stub >>>> implementation for, say, x86 AVX-512 might decide bomb out for < 64 >>>> remaining elements, rather than apply vector operations on smaller vector >>>> sizes or use a scalar loop. It does not today, but I think we should guard >>>> against that potentially happening, otherwise bad things can happen. >>> >>> So your worry here is that we'll end up with an infinite loop, right? >>> >> >> Or more likely that an incorrect result is returned since tail elements will >> be skipped over as the offset and size is updated. >> >> >>> If so, we could check remaining against previous remaining and bomb out too >>> if no further progress seem to be made? >>> >> >> I think it's better to always terminate the loop after the last sub-range is >> checked, rather than unnecessarily calling twice. >> >> I am not confident the vectorizedMismatch intrinsic stub has been tested >> properly on very small lengths, since it's never directly called in such >> cases, so also keeping "remaining > 7" is good too. >> >> Paul. >> >>>> >>>> I think the loop should exit when the last sub-range has been checked. We >>>> should rely on other tests to ensure the intrinsic method is operating >>>> efficiently. >>>> >>>> >>>> 2) This method only works when operating on byte arrays. It will not work >>>> correctly if operating on short or long arrays, since we are not adjusting >>>> the length and offsets accordingly by the scale. It's probably easier to >>>> just rename this as vectorizedMismatchLargeForBytes and drop the >>>> log2ArrayIndexScale argument. Then expand later if need be. I still think >>>> the method rightly belongs in ArraysSupport. >>> >>> Yep - probably good idea to restrict on bytes, for now. >>> >>> Maurizio >>> >>>> >>>> Paul. >>>> >>>>> On Jun 17, 2020, at 8:33 AM, Chris Hegarty <chris.hega...@oracle.com> >>>>> wrote: >>>>> >>>>> The MemorySegment::mismatch implementation added vectorized mismatch of >>>>> long sizes. The implementation is trivial, but the starting point for a >>>>> more optimized implementation, if needed. >>>>> ArraysSupport::vectorizedMismatchLarge incorrectly returns the bitwise >>>>> complement of the offset of the first mismatch, where it should return >>>>> the bitwise complement of the number of remaining pairs of elements to be >>>>> checked in the tail of the two arrays. The >>>>> AbstractMemorySegmentImpl::mismatch masked this problem, since it >>>>> seamlessly compared the remaining tail, which is larger than it should be. >>>>> >>>>> Webrev: >>>>> https://cr.openjdk.java.net/~chegar/8247696/webrev.00/ >>>>> >>>>> I updated the exiting BulkOps micro-benchmark to cover mismatch. Here are >>>>> the results, compared to ByteBuffer::mismatch, on my machine: >>>>> >>>>> Benchmark Mode Cnt Score Error >>>>> Units >>>>> BulkOps.mismatch_large_bytebuffer avgt 30 740186.973 ? 119314.207 >>>>> ns/op >>>>> BulkOps.mismatch_large_segment avgt 30 683475.305 ? 76355.043 >>>>> ns/op >>>>> BulkOps.mismatch_small_bytebuffer avgt 30 7.367 ? 0.523 >>>>> ns/op >>>>> BulkOps.mismatch_small_segment avgt 30 4.140 ? 0.602 >>>>> ns/op >>>>> >>>>> >>>>> -Chris. >