On Wed, 14 Oct 2020 21:29:55 GMT, CoreyAshford <github.com+51754783+coreyashf...@openjdk.org> wrote:
>> Hi Corey, >> >>> Are you thinking of a case where that produces a higher iteration count? >> Sorry for the confusion. This is also fine: >> length = sl - sp - 12 >> i = length / block_size >> if (i <= 0) return 0 >> >> But I still wonder why we should use 2 branches. Why not >> srawi_ >> ble(CCR0, return_zero) >> ? >> >>> Ah, I should have checked the calling conventions. I thought all of the CR* >>> regs are volatile. I will fix that. >> Actually, we do save and restore all CRs, so it’s not a real problem with >> the current implementation. But I prefer >> staying closer to the elf ABI as long as there’s no good reason to do it >> differently. >>> Your original comment said "2nd review", so I thought you meant you need to >>> review it again after the changes. >> We usually require at least 2 reviews by different people for all >> non-trivial changes. And I don’t consider the PPC64 >> part as trivial. In addition to that, I’m not familiar with Power 10. >> Best regards, >> Martin >> >> >> From: CoreyAshford <notificati...@github.com> >> Sent: Dienstag, 13. Oktober 2020 22:59 >> To: openjdk/jdk <j...@noreply.github.com> >> Cc: Doerr, Martin <martin.do...@sap.com>; Mention >> <ment...@noreply.github.com> >> Subject: Re: [openjdk/jdk] 8248188: Add IntrinsicCandidate and API for >> Base64 decoding (#293) >> >> >> Hi Corey, thanks for taking some stuff out of the “too short” path. There >> may be a performance regression when decoding >> many short arrays because of the stub call overhead and the usage of the >> slower part of the Java implementation. We >> could do it a little better in many cases to compute the maximum possible >> iteration count i: i = (sl - sp) / block_size >> if (i * block_size > sl - 12) i-- if (i <= 0) return 0 What do you think? >> Are you thinking of a case where that >> produces a higher iteration count? It looks effectively the same to me. I >> don’t think branch prediction hints are >> helpful for the “too short” check. >> My thinking is that most of the time when the intrinsic is called, it will >> not take the early exit, but I suppose when >> it is processing a sub-block_size buffer, it will return early every time. I >> will remove the hints. >> And we should better use CCR1 instead of CCR2 which is specified as >> non-volatile. >> >> Ah, I should have checked the calling conventions. I thought all of the CR* >> regs are volatile. I will fix that. >> >> Did you already find a 2nd reviewer for the PPC64 part? >> >> Your original comment said "2nd review", so I thought you meant you need to >> review it again after the changes. So, no, >> I haven't looked for or found a second reviewer. Any suggestions? The folks >> on the team here have been busy with other >> work. Btw, I'm off today, so I will push commits to the above-mentioned >> issues tomorrow. >> >> — >> You are receiving this because you were mentioned. >> Reply to this email directly, view it on >> GitHub<https://github.com/openjdk/jdk/pull/293#issuecomment-708005545>, or >> unsubscribe<https://github.com/notifications/unsubscribe-auth/AKR64KC2K7IZFAPXVQIVYZDSKS5SLANCNFSM4RVHNW5Q>. > >> Hi Corey, >> Are you thinking of a case where that produces a higher iteration count? >> Sorry for the confusion. This is also fine: length = sl - sp - 12 i = length >> / block_size if (i <= 0) return 0 But I >> still wonder why we should use 2 branches. Why not srawi_ ble(CCR0, >> return_zero) ? > > You're right! I originally thought that the `srawi.` was setting only the > Zero bit, but it sets others as well. > >> Ah, I should have checked the calling conventions. I thought all of the CR* >> regs are volatile. I will fix that. >> Actually, we do save and restore all CRs, so it’s not a real problem with >> the current implementation. But I prefer >> staying closer to the elf ABI as long as there’s no good reason to do it >> differently. > > Looks like I don't need that code at all now, but it's good to know for the > future; I have an encode intrinsic in the > works. >> Your original comment said "2nd review", so I thought you meant you need to >> review it again after the changes. >> We usually require at least 2 reviews by different people for all >> non-trivial changes. And I don’t consider the PPC64 >> part as trivial. In addition to that, I’m not familiar with Power 10. > > I received permission to request help from the GNU toolchain team here to > review it. Due to family issues and work > schedule on my end, it will be at least the middle of next week before I can > get a reviewer to have a look. > Thanks for your continued patience and help. Please update [compiler/graalunit/HotspotTest.java](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/graalunit/HotspotTest.java), and add the intrinsic signature. ------------- PR: https://git.openjdk.java.net/jdk/pull/293