RE: [RFC] Enabling SVE with offloading to nvptx

Prathamesh Kulkarni via Gcc Thu, 28 Nov 2024 02:37:01 -0800


> -----Original Message-----
> From: Prathamesh Kulkarni <prathame...@nvidia.com>
> Sent: 21 November 2024 10:31
> To: Prathamesh Kulkarni <prathame...@nvidia.com>; Andrew Stubbs
> <a...@baylibre.com>; Jakub Jelinek <ja...@redhat.com>
> Cc: Richard Biener <richard.guent...@gmail.com>; Richard Biener
> <rguent...@suse.de>; gcc@gcc.gnu.org; Thomas Schwinge
> <tschwi...@baylibre.com>
> Subject: RE: [RFC] Enabling SVE with offloading to nvptx
> 
> 
> 
> > -----Original Message-----
> > From: Gcc <gcc-bounces~prathameshk=nvidia....@gcc.gnu.org> On Behalf
> > Of Prathamesh Kulkarni via Gcc
> > Sent: 14 November 2024 13:59
> > To: Andrew Stubbs <a...@baylibre.com>; Jakub Jelinek
> <ja...@redhat.com>
> > Cc: Richard Biener <richard.guent...@gmail.com>; Richard Biener
> > <rguent...@suse.de>; gcc@gcc.gnu.org; Thomas Schwinge
> > <tschwi...@baylibre.com>
> > Subject: RE: [RFC] Enabling SVE with offloading to nvptx
> >
> > External email: Use caution opening links or attachments
> >
> >
> > > -----Original Message-----
> > > From: Andrew Stubbs <a...@baylibre.com>
> > > Sent: 12 November 2024 20:23
> > > To: Prathamesh Kulkarni <prathame...@nvidia.com>; Jakub Jelinek
> > > <ja...@redhat.com>
> > > Cc: Richard Biener <richard.guent...@gmail.com>; Richard Biener
> > > <rguent...@suse.de>; gcc@gcc.gnu.org; Thomas Schwinge
> > > <tschwi...@baylibre.com>
> > > Subject: Re: [RFC] Enabling SVE with offloading to nvptx
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On 12/11/2024 06:01, Prathamesh Kulkarni via Gcc wrote:
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: Jakub Jelinek <ja...@redhat.com>
> > > >> Sent: 04 November 2024 21:44
> > > >> To: Prathamesh Kulkarni <prathame...@nvidia.com>
> > > >> Cc: Richard Biener <richard.guent...@gmail.com>; Richard Biener
> > > >> <rguent...@suse.de>; gcc@gcc.gnu.org; Thomas Schwinge
> > > >> <tschwi...@baylibre.com>
> > > >> Subject: Re: [RFC] Enabling SVE with offloading to nvptx
> > > >>
> > > >> External email: Use caution opening links or attachments
> > > >>
> > > >>
> > > >> On Sat, Nov 02, 2024 at 03:53:34PM +0000, Prathamesh Kulkarni
> > > wrote:
> > > >>> The attached patch adds a new bitfield needs_max_vf_lowering
> to
> > > >> loop,
> > > >>> and sets that in expand_omp_simd for loops that need delayed
> > > >> lowering
> > > >>> of safelen and omp simd arrays.  The patch defines a new macro
> > > >>> OMP_COMMON_MAX_VF (arbitrarily set to 16), as a placeholder
> > value
> > > >> for
> > > >>> max_vf (instead of INT_MAX), and is later replaced by
> > appropriate
> > > >>> max_vf during omp_adjust_max_vf pass.  Does that look OK ?
> > > >>
> > > >> No.
> > > >> The thing is, if user doesn't specify safelen, it defaults to
> > > >> infinity (which we represent as INT_MAX), if user specifies it,
> > > then
> > > >> that is the maximum for it (currently in OpenMP specification
> it
> > is
> > > >> just an integral value, so can't be a poly int).
> > > >> And then the lowering uses the max_vf as another limit, what
> the
> > hw
> > > >> can do at most and sizes the magic arrays with it.  So, one
> needs
> > > to
> > > >> use minimum of what user specified and what the hw can handle.
> > > >> So using 16 as some magic value is just wrong, safelen(16) can
> be
> > > >> specified in the source as well, or safelen(8), or safelen(32)
> or
> > > >> safelen(123).
> > > >>
> > > >> Thus, the fact that the hw minimum hasn't been determined yet
> > needs
> > > >> to be represented in some other flag, not in loop->safelen
> value,
> > > and
> > > >> before that is determined, loop->safelen should then represent
> > what
> > > >> the user wrote (or was implied) and the later pass should use
> > > minimum
> > > >> from loop->safelen and the picked hw maximum.  Of course if the
> > > >> picked hw maximum is POLY_INT-ish, the big question is how to
> > > compare
> > > >> that against the user supplied integer value, either one can
> just
> > > >> handle the INT_MAX (aka
> > > >> infinity) special case, or say query the backend on what is the
> > > >> maximum value of the POLY_INT at runtime and only use the
> > POLY_INT
> > > if
> > > >> it is always known to be smaller or equal to the user supplied
> > > >> safelen.
> > > >>
> > > >> Another thing (already mentioned in the thread Andrew
> referenced)
> > > is
> > > >> that max_vf is used in two separate places.  One is just to
> size
> > of
> > > >> the magic arrays and one of the operands of the minimum (the
> > other
> > > is
> > > >> user specified safelen).  In this case, it is generally just
> fine
> > > to
> > > >> pick later value than strictly necessary (as long as it is
> never
> > > >> larger than user supplied safelen).
> > > >> The other case is simd modifier on schedule clause.  That value
> > > >> should better be the right one or slightly larger, but not too
> > > much.
> > > >> I think currently we just use the INTEGER_CST we pick as the
> > > maximum,
> > > >> if this sizing is deferred, maybe it needs to be another
> internal
> > > >> function that asks the value (though, it can refer to a loop vf
> > in
> > > >> another function, which complicates stuff).
> > > >>
> > > >> Regarding Richi's question, I'm afraid the OpenMP simd loop
> > > lowering
> > > >> can't be delayed until some later pass.
> > > > Hi Jakub,
> > > > Thanks for the suggestions! The attached patch makes the
> following
> > > changes:
> > > > (1) Delays setting of safelen for offloading by introducing a
> new
> > > > bitfield needs_max_vf_lowering in loop, which is true with
> > > offloading enabled, and safelen is then set to min(safelen,
> max_vf)
> > > for the target later in omp_device_lower pass.
> > > > Comparing user-specified safelen with poly_int max_vf may not be
> > > > always possible at compile-time (say 32 and 16+16x), and even if
> > we
> > > determine runtime VL based on -mcpu flags, I guess relying on that
> > > won't be portable ?
> > > > The patch works around this by taking constant_lower_bound
> > (max_vf),
> > > > and comparing it with safelen instead, with the downside that
> > > constant_lower_bound(max_vf) will not be the optimal max_vf for
> SVE
> > > target if it implements SIMD width > 128 bits.
> > > >
> > > > (2) Since max_vf is used as length of omp simd array, it gets
> > > streamed
> > > > out to device, and device compilation fails during streaming-in
> if
> > > > max_vf is poly_int (16+16x), and device's NUM_POLY_INT_COEFFS <
> 2
> > > (which motivated my patch). The patch tries to address this by
> > simply
> > > setting length to a placeholder value (INT_MAX?) in
> > > lower_rec_simd_input_clauses if offloading is enabled, and will be
> > > later set to appropriate value in omp_device_lower pass.
> > > >
> > > > (3) Andrew's patches seems to already fix the case for adjusting
> > > > chunk_size for schedule clause with simd modifier by introducing
> a
> > > new internal function .GOMP_MAX_VF, which is then replaced by
> > target's
> > > max_vf. To keep it consistent with safelen, the patch here uses
> > > constant_lower_bound (max_vf) too.
> > > >
> > > > Patch passes libgomp testing for AArch64/nvptx offloading (with
> > and
> > > without GPU).
> > > > Does it look OK ?
> > >
> > > I've not reviewed the patch in detail, but I can confirm that this
> > > does not break my usecase or cause any test regressions, for me.
> > Hi Andrew,
> > Thanks for testing the patch!
> > >
> > > However, are you sure that ompdevlow is always running? I think
> you
> > > need to add this, somewhere:
> > >
> > >    cfun->curr_properties &= ~PROP_gimple_lomp_dev;
> > >
> > > My patch added this into omp_adjust_chunk_size, which triggers it
> > for
> > > certain schedule clauses, but I think you want it whenever safelen
> > is
> > > needed?
> > Ah indeed, thanks for the suggestions.
> > I have tried to fix it in the attached patch.
> Hi Jakub,
> Could you please take a look at this patch:
> https://gcc.gnu.org/pipermail/gcc/2024-November/245139.html
> 
> It makes the following changes:
> (1) Delays setting of safelen for offloading by introducing a new
> bitfield needs_max_vf_lowering in loop, which is true with offloading
> enabled, and safelen is then set to min(safelen, max_vf) for the
> target later in omp_device_lower pass.
> 
> Comparing user-specified safelen with poly_int max_vf may not be
> always possible at compile-time (say 32 and 16+16x), and even if we
> determine runtime VL based on -mcpu flags, I guess relying on that
> won't be portable ? The patch works around this by taking
> constant_lower_bound (max_vf), and comparing it with safelen instead,
> with the downside that constant_lower_bound(max_vf) will not be the
> optimal max_vf for SVE target if it implements SIMD width > 128 bits.
> 
> (2) Since max_vf is used as length of omp simd array, it gets streamed
> out to device, and device compilation fails during streaming-in if
> max_vf is poly_int (16+16x), and device's NUM_POLY_INT_COEFFS < 2
> (which motivated my patch). The patch tries to address this by simply
> setting length to a placeholder value (INT_MAX?) in
> lower_rec_simd_input_clauses if offloading is enabled, and will be
> later set to appropriate value in omp_device_lower pass.
> 
> (3) Andrew's patches seems to already fix the case for adjusting
> chunk_size for schedule clause with simd modifier by introducing a new
> internal function .GOMP_MAX_VF, which is then replaced by target's
> max_vf. To keep it consistent with safelen, the patch here uses
> constant_lower_bound (max_vf) too.
> 
> Patch passes libgomp testing for AArch64/nvptx offloading (with and
> without GPU).
> Does it look OK ?
Hi Jakub,
ping: https://gcc.gnu.org/pipermail/gcc/2024-November/245139.html


Thanks,
Prathamesh
> 
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > Andrew
> > >
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > >>
> > > >>          Jakub
> > > >

RE: [RFC] Enabling SVE with offloading to nvptx

Reply via email to