> -----Original Message----- > From: Prathamesh Kulkarni <prathame...@nvidia.com> > Sent: 21 November 2024 10:31 > To: Prathamesh Kulkarni <prathame...@nvidia.com>; Andrew Stubbs > <a...@baylibre.com>; Jakub Jelinek <ja...@redhat.com> > Cc: Richard Biener <richard.guent...@gmail.com>; Richard Biener > <rguent...@suse.de>; gcc@gcc.gnu.org; Thomas Schwinge > <tschwi...@baylibre.com> > Subject: RE: [RFC] Enabling SVE with offloading to nvptx > > > > > -----Original Message----- > > From: Gcc <gcc-bounces~prathameshk=nvidia....@gcc.gnu.org> On Behalf > > Of Prathamesh Kulkarni via Gcc > > Sent: 14 November 2024 13:59 > > To: Andrew Stubbs <a...@baylibre.com>; Jakub Jelinek > <ja...@redhat.com> > > Cc: Richard Biener <richard.guent...@gmail.com>; Richard Biener > > <rguent...@suse.de>; gcc@gcc.gnu.org; Thomas Schwinge > > <tschwi...@baylibre.com> > > Subject: RE: [RFC] Enabling SVE with offloading to nvptx > > > > External email: Use caution opening links or attachments > > > > > > > -----Original Message----- > > > From: Andrew Stubbs <a...@baylibre.com> > > > Sent: 12 November 2024 20:23 > > > To: Prathamesh Kulkarni <prathame...@nvidia.com>; Jakub Jelinek > > > <ja...@redhat.com> > > > Cc: Richard Biener <richard.guent...@gmail.com>; Richard Biener > > > <rguent...@suse.de>; gcc@gcc.gnu.org; Thomas Schwinge > > > <tschwi...@baylibre.com> > > > Subject: Re: [RFC] Enabling SVE with offloading to nvptx > > > > > > External email: Use caution opening links or attachments > > > > > > > > > On 12/11/2024 06:01, Prathamesh Kulkarni via Gcc wrote: > > > > > > > > > > > >> -----Original Message----- > > > >> From: Jakub Jelinek <ja...@redhat.com> > > > >> Sent: 04 November 2024 21:44 > > > >> To: Prathamesh Kulkarni <prathame...@nvidia.com> > > > >> Cc: Richard Biener <richard.guent...@gmail.com>; Richard Biener > > > >> <rguent...@suse.de>; gcc@gcc.gnu.org; Thomas Schwinge > > > >> <tschwi...@baylibre.com> > > > >> Subject: Re: [RFC] Enabling SVE with offloading to nvptx > > > >> > > > >> External email: Use caution opening links or attachments > > > >> > > > >> > > > >> On Sat, Nov 02, 2024 at 03:53:34PM +0000, Prathamesh Kulkarni > > > wrote: > > > >>> The attached patch adds a new bitfield needs_max_vf_lowering > to > > > >> loop, > > > >>> and sets that in expand_omp_simd for loops that need delayed > > > >> lowering > > > >>> of safelen and omp simd arrays. The patch defines a new macro > > > >>> OMP_COMMON_MAX_VF (arbitrarily set to 16), as a placeholder > > value > > > >> for > > > >>> max_vf (instead of INT_MAX), and is later replaced by > > appropriate > > > >>> max_vf during omp_adjust_max_vf pass. Does that look OK ? > > > >> > > > >> No. > > > >> The thing is, if user doesn't specify safelen, it defaults to > > > >> infinity (which we represent as INT_MAX), if user specifies it, > > > then > > > >> that is the maximum for it (currently in OpenMP specification > it > > is > > > >> just an integral value, so can't be a poly int). > > > >> And then the lowering uses the max_vf as another limit, what > the > > hw > > > >> can do at most and sizes the magic arrays with it. So, one > needs > > > to > > > >> use minimum of what user specified and what the hw can handle. > > > >> So using 16 as some magic value is just wrong, safelen(16) can > be > > > >> specified in the source as well, or safelen(8), or safelen(32) > or > > > >> safelen(123). > > > >> > > > >> Thus, the fact that the hw minimum hasn't been determined yet > > needs > > > >> to be represented in some other flag, not in loop->safelen > value, > > > and > > > >> before that is determined, loop->safelen should then represent > > what > > > >> the user wrote (or was implied) and the later pass should use > > > minimum > > > >> from loop->safelen and the picked hw maximum. Of course if the > > > >> picked hw maximum is POLY_INT-ish, the big question is how to > > > compare > > > >> that against the user supplied integer value, either one can > just > > > >> handle the INT_MAX (aka > > > >> infinity) special case, or say query the backend on what is the > > > >> maximum value of the POLY_INT at runtime and only use the > > POLY_INT > > > if > > > >> it is always known to be smaller or equal to the user supplied > > > >> safelen. > > > >> > > > >> Another thing (already mentioned in the thread Andrew > referenced) > > > is > > > >> that max_vf is used in two separate places. One is just to > size > > of > > > >> the magic arrays and one of the operands of the minimum (the > > other > > > is > > > >> user specified safelen). In this case, it is generally just > fine > > > to > > > >> pick later value than strictly necessary (as long as it is > never > > > >> larger than user supplied safelen). > > > >> The other case is simd modifier on schedule clause. That value > > > >> should better be the right one or slightly larger, but not too > > > much. > > > >> I think currently we just use the INTEGER_CST we pick as the > > > maximum, > > > >> if this sizing is deferred, maybe it needs to be another > internal > > > >> function that asks the value (though, it can refer to a loop vf > > in > > > >> another function, which complicates stuff). > > > >> > > > >> Regarding Richi's question, I'm afraid the OpenMP simd loop > > > lowering > > > >> can't be delayed until some later pass. > > > > Hi Jakub, > > > > Thanks for the suggestions! The attached patch makes the > following > > > changes: > > > > (1) Delays setting of safelen for offloading by introducing a > new > > > > bitfield needs_max_vf_lowering in loop, which is true with > > > offloading enabled, and safelen is then set to min(safelen, > max_vf) > > > for the target later in omp_device_lower pass. > > > > Comparing user-specified safelen with poly_int max_vf may not be > > > > always possible at compile-time (say 32 and 16+16x), and even if > > we > > > determine runtime VL based on -mcpu flags, I guess relying on that > > > won't be portable ? > > > > The patch works around this by taking constant_lower_bound > > (max_vf), > > > > and comparing it with safelen instead, with the downside that > > > constant_lower_bound(max_vf) will not be the optimal max_vf for > SVE > > > target if it implements SIMD width > 128 bits. > > > > > > > > (2) Since max_vf is used as length of omp simd array, it gets > > > streamed > > > > out to device, and device compilation fails during streaming-in > if > > > > max_vf is poly_int (16+16x), and device's NUM_POLY_INT_COEFFS < > 2 > > > (which motivated my patch). The patch tries to address this by > > simply > > > setting length to a placeholder value (INT_MAX?) in > > > lower_rec_simd_input_clauses if offloading is enabled, and will be > > > later set to appropriate value in omp_device_lower pass. > > > > > > > > (3) Andrew's patches seems to already fix the case for adjusting > > > > chunk_size for schedule clause with simd modifier by introducing > a > > > new internal function .GOMP_MAX_VF, which is then replaced by > > target's > > > max_vf. To keep it consistent with safelen, the patch here uses > > > constant_lower_bound (max_vf) too. > > > > > > > > Patch passes libgomp testing for AArch64/nvptx offloading (with > > and > > > without GPU). > > > > Does it look OK ? > > > > > > I've not reviewed the patch in detail, but I can confirm that this > > > does not break my usecase or cause any test regressions, for me. > > Hi Andrew, > > Thanks for testing the patch! > > > > > > However, are you sure that ompdevlow is always running? I think > you > > > need to add this, somewhere: > > > > > > cfun->curr_properties &= ~PROP_gimple_lomp_dev; > > > > > > My patch added this into omp_adjust_chunk_size, which triggers it > > for > > > certain schedule clauses, but I think you want it whenever safelen > > is > > > needed? > > Ah indeed, thanks for the suggestions. > > I have tried to fix it in the attached patch. > Hi Jakub, > Could you please take a look at this patch: > https://gcc.gnu.org/pipermail/gcc/2024-November/245139.html > > It makes the following changes: > (1) Delays setting of safelen for offloading by introducing a new > bitfield needs_max_vf_lowering in loop, which is true with offloading > enabled, and safelen is then set to min(safelen, max_vf) for the > target later in omp_device_lower pass. > > Comparing user-specified safelen with poly_int max_vf may not be > always possible at compile-time (say 32 and 16+16x), and even if we > determine runtime VL based on -mcpu flags, I guess relying on that > won't be portable ? The patch works around this by taking > constant_lower_bound (max_vf), and comparing it with safelen instead, > with the downside that constant_lower_bound(max_vf) will not be the > optimal max_vf for SVE target if it implements SIMD width > 128 bits. > > (2) Since max_vf is used as length of omp simd array, it gets streamed > out to device, and device compilation fails during streaming-in if > max_vf is poly_int (16+16x), and device's NUM_POLY_INT_COEFFS < 2 > (which motivated my patch). The patch tries to address this by simply > setting length to a placeholder value (INT_MAX?) in > lower_rec_simd_input_clauses if offloading is enabled, and will be > later set to appropriate value in omp_device_lower pass. > > (3) Andrew's patches seems to already fix the case for adjusting > chunk_size for schedule clause with simd modifier by introducing a new > internal function .GOMP_MAX_VF, which is then replaced by target's > max_vf. To keep it consistent with safelen, the patch here uses > constant_lower_bound (max_vf) too. > > Patch passes libgomp testing for AArch64/nvptx offloading (with and > without GPU). > Does it look OK ? Hi Jakub, ping: https://gcc.gnu.org/pipermail/gcc/2024-November/245139.html
Thanks, Prathamesh > > Thanks, > Prathamesh > > > > Thanks, > > Prathamesh > > > > > > Andrew > > > > > > > > > > > Thanks, > > > > Prathamesh > > > >> > > > >> Jakub > > > >