> -----Original Message-----
> From: Jakub Jelinek <[email protected]>
> Sent: Wednesday, July 31, 2024 8:46 PM
> To: Prathamesh Kulkarni <[email protected]>
> Cc: Richard Biener <[email protected]>; Richard Sandiford
> <[email protected]>; [email protected]
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
>
> External email: Use caution opening links or attachments
>
>
> On Wed, Jul 31, 2024 at 02:58:34PM +0000, Prathamesh Kulkarni wrote:
> > There are a couple of issues in the patch:
> > (1) The patch streams out NUM_POLY_INT_COEFFS at beginning of
> > mode_table, which should work for bp_unpack_poly_value, (since AFAIK,
> > it's only called by lto_input_mode_table). However, I am not sure if
> we will always call lto_input_mode_table before streaming in poly_int64
> / poly_uint64 ? Or should we stream out host NUM_POLY_INT_COEFFS at a
> different place in LTO bytecode ?
>
> The poly_ints unpacked in lto_input_mode_table obviously are done after
> that.
> If you use it for streaming in from other sections, you need to check if
> they can't be read before the mode table.
> And, you don't really need to stream out/in the number for non-
> offloading LTO, that should use just NUM_POLY_INT_COEFFS.
>
> > --- a/gcc/data-streamer-in.cc
> > +++ b/gcc/data-streamer-in.cc
> > @@ -183,9 +183,7 @@ poly_uint64
> > streamer_read_poly_uint64 (class lto_input_block *ib) {
> > poly_uint64 res;
> > - for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
> > - res.coeffs[i] = streamer_read_uhwi (ib);
> > - return res;
> > + POLY_INT_READ_COMMON(res, streamer_read_uhwi (ib))
>
> Why is this macro and not an inline function or inline function template
> oor inline function calling a lambda?
> Even if it has to be a macro (I don't see why), it should be defined
> such that you need to add ; at the end, ideally not include the return
> res; in there because it is just too weird if used like that (or make it
> return what will be returned and use return POLY_INT_READ_COMMON...) and
> there needs to be a space in between COMMON and (.
>
> > @@ -194,9 +192,7 @@ poly_int64
> > streamer_read_poly_int64 (class lto_input_block *ib) {
> > poly_int64 res;
> > - for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
> > - res.coeffs[i] = streamer_read_hwi (ib);
> > - return res;
> > + POLY_INT_READ_COMMON(res, streamer_read_hwi (ib))
> > }
>
> Ditto.
> > + __typeof(x.coeffs[0]) val = streamer_read_coeff;
> \
>
> You certainly can't use a GCC extension like __typeof here.
> Plus missing space.
>
> > + if (val != 0)
> \
> > + fatal_error (input_location,
> \
> > + "Degree of %<poly_int%> exceeds "
> \
>
> Diagnostics shouldn't start with uppercase letter.
>
> > + "%<NUM_POLY_INT_COEFFS%> (%d)",
> \
> > + NUM_POLY_INT_COEFFS);
> \
> > + }
> \
> > + }
> \
> > +
> \
> > + return x;
> \
> > +}
> > +
> > --- a/gcc/poly-int.h
> > +++ b/gcc/poly-int.h
> > @@ -354,6 +354,10 @@ struct poly_result<T1, T2, 2>
> > ? (void) ((RES).coeffs[I] = VALUE) \
> > : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C
> > (VALUE)))
> >
> > +/* Number of bits needed to represent maximum value of
> > + NUM_POLY_INT_COEFFS defined by any target. */ #define
> > +MAX_NUM_POLY_INT_COEFFS_BITS (2)
>
> Why (2) and not just 2?
> There should be some static_assert to make sure it is a maximum for any
> target.
>
> > + if (!integer_zerop (val))
> > + fatal_error (input_location,
> > + "Degree of %<poly_int%> exceeds "
>
> Again.
Hi Jakub,
Thanks for the suggestions. The attached patch streams out NUM_POLY_INT_COEFFS
only if offloading is enabled,
defines poly_int_read_common to be variadic template, and asserts that
host_num_poly_int_coeffs is streamed-in
before reading poly_int (AFAIU, the only user of streamer_read_poly_int64
currently is ipa-modref pass).
Patch passes LTO bootstrap+test on aarch64-linux-gnu and in-progress on
x86_64-linux-gnu. To test this patch with
offloading enabled, I have so far just been testing libgomp (make
check-target-libgomp). Should I be testing any
additional parts of the testsuite for offloading ?
Does the attached patch look in the right direction ?
Signed-off-by: Prathamesh Kulkarni <[email protected]>
Thanks,
Prathamesh
> > + "%<NUM_POLY_INT_COEFFS%>");
> > + }
> > + }
> > }
>
> Jakub
Partially support streaming of poly_int for offloading.
The patch streams out host NUM_POLY_INT_COEFFS, and changes
streaming in as follows:
if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
{
for (i = 0; i < host_num_poly_int_coeffs; i++)
poly_int.coeffs[i] = stream_in coeff;
for (; i < NUM_POLY_INT_COEFFS; i++)
poly_int.coeffs[i] = 0;
}
else
{
for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
poly_int.coeffs[i] = stream_in coeff;
/* Ensure that degree of poly_int <= accel NUM_POLY_INT_COEFFS. */
for (; i < host_num_poly_int_coeffs; i++)
{
val = stream_in coeff;
if (val != 0)
error ();
}
}
gcc/ChangeLog:
PR ipa/96265
PR ipa/111937
* data-streamer-in.cc (streamer_read_poly_uint64): Remove code for
streaming, and call poly_int_read_common instead.
(streamer_read_poly_int64): Likewise.
* data-streamer.cc (host_num_poly_int_coeffs): New variable.
* data-streamer.h (host_num_poly_int_coeffs): Declare.
(poly_int_read_common): New function template.
(bp_unpack_poly_value): Remove code for streaming and call
poly_int_read_common instead.
* lto-streamer-in.cc (lto_input_mode_table): Stream-in host
NUM_POLY_INT_COEFFS into host_num_poly_int_coeffs.
* lto-streamer-out.cc (lto_write_mode_table): Stream out
NUM_POLY_INT_COEFFS if offloading is enabled.
* poly-int.h (MAX_NUM_POLY_INT_COEFFS_BITS): New macro.
* tree-streamer-in.cc (lto_input_ts_poly_tree_pointers): Adjust
streaming-in of poly_int.
Signed-off-by: Prathamesh Kulkarni <[email protected]>
diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc
index 7dce2928ef0..7b9d8cc0129 100644
--- a/gcc/data-streamer-in.cc
+++ b/gcc/data-streamer-in.cc
@@ -182,10 +182,8 @@ streamer_read_hwi (class lto_input_block *ib)
poly_uint64
streamer_read_poly_uint64 (class lto_input_block *ib)
{
- poly_uint64 res;
- for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
- res.coeffs[i] = streamer_read_uhwi (ib);
- return res;
+ return poly_int_read_common<poly_int_traits<poly_uint64>::coeff_type>
+ (streamer_read_uhwi, ib);
}
/* Read a poly_int64 from IB. */
@@ -193,10 +191,8 @@ streamer_read_poly_uint64 (class lto_input_block *ib)
poly_int64
streamer_read_poly_int64 (class lto_input_block *ib)
{
- poly_int64 res;
- for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
- res.coeffs[i] = streamer_read_hwi (ib);
- return res;
+ return poly_int_read_common<poly_int_traits<poly_int64>::coeff_type>
+ (streamer_read_hwi, ib);
}
/* Read gcov_type value from IB. */
diff --git a/gcc/data-streamer.cc b/gcc/data-streamer.cc
index 346b294c72a..5bf2f086d2a 100644
--- a/gcc/data-streamer.cc
+++ b/gcc/data-streamer.cc
@@ -28,6 +28,12 @@ along with GCC; see the file COPYING3. If not see
#include "cgraph.h"
#include "data-streamer.h"
+/* For offloading -- While streaming-out, host NUM_POLY_INT_COEFFS is
+ stored at beginning of mode_table. While streaming-in, the value is read in
+ host_num_poly_int_coeffs. */
+
+unsigned host_num_poly_int_coeffs = 0;
+
/* Pack WORK into BP in a variant of uleb format. */
void
diff --git a/gcc/data-streamer.h b/gcc/data-streamer.h
index 6a2596134ce..c77eacb74ca 100644
--- a/gcc/data-streamer.h
+++ b/gcc/data-streamer.h
@@ -50,6 +50,7 @@ void bp_pack_real_value (struct bitpack_d *, const
REAL_VALUE_TYPE *);
void bp_unpack_real_value (struct bitpack_d *, REAL_VALUE_TYPE *);
unsigned HOST_WIDE_INT bp_unpack_var_len_unsigned (struct bitpack_d *);
HOST_WIDE_INT bp_unpack_var_len_int (struct bitpack_d *);
+extern unsigned host_num_poly_int_coeffs;
/* In data-streamer-out.cc */
void streamer_write_zero (struct output_block *);
@@ -194,15 +195,53 @@ bp_unpack_value (struct bitpack_d *bp, unsigned nbits)
return val & mask;
}
+/* Common code for reading poly_int. */
+
+template<typename C, typename F, typename ...Args>
+poly_int<NUM_POLY_INT_COEFFS, C>
+poly_int_read_common (F read_coeff, Args ...args)
+{
+ poly_int<NUM_POLY_INT_COEFFS, C> x;
+ unsigned i;
+
+#ifndef ACCEL_COMPILER
+ host_num_poly_int_coeffs = NUM_POLY_INT_COEFFS;
+#endif
+
+ /* Ensure that we have streamed-in host_num_poly_int_coeffs. */
+ gcc_assert (host_num_poly_int_coeffs > 0);
+ if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
+ {
+ for (i = 0; i < host_num_poly_int_coeffs; i++)
+ x.coeffs[i] = read_coeff (args...);
+ for (; i < NUM_POLY_INT_COEFFS; i++)
+ x.coeffs[i] = 0;
+ }
+ else
+ {
+ for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
+ x.coeffs[i] = read_coeff (args...);
+
+ /* Ensure that degree of poly_int <= accel NUM_POLY_INT_COEFFS. */
+ for (; i < host_num_poly_int_coeffs; i++)
+ {
+ C val = read_coeff (args...);
+ if (val != 0)
+ fatal_error (input_location,
+ "degree of %<poly_int%> exceeds "
+ "%<NUM_POLY_INT_COEFFS%> (%d)",
+ NUM_POLY_INT_COEFFS);
+ }
+ }
+ return x;
+}
+
/* Unpacks a polynomial value from the bit-packing context BP in which each
coefficient has NBITS bits. */
inline poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t>
bp_unpack_poly_value (struct bitpack_d *bp, unsigned nbits)
{
- poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> x;
- for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
- x.coeffs[i] = bp_unpack_value (bp, nbits);
- return x;
+ return poly_int_read_common<bitpack_word_t> (bp_unpack_value, bp, nbits);
}
diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 2e592be8082..3e2c786fc36 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -2013,6 +2013,9 @@ lto_input_mode_table (struct lto_file_decl_data
*file_data)
header->string_size, vNULL);
bitpack_d bp = streamer_read_bitpack (&ib);
+ host_num_poly_int_coeffs
+ = bp_unpack_value (&bp, MAX_NUM_POLY_INT_COEFFS_BITS);
+
unsigned mode_bits = bp_unpack_value (&bp, 5);
unsigned char *table = ggc_cleared_vec_alloc<unsigned char> (1 << mode_bits);
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index c329ac8af95..523d6dad221 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -3192,6 +3192,9 @@ lto_write_mode_table (void)
ob = create_output_block (LTO_section_mode_table);
bitpack_d bp = bitpack_create (ob->main_stream);
+ if (lto_stream_offload_p)
+ bp_pack_value (&bp, NUM_POLY_INT_COEFFS, MAX_NUM_POLY_INT_COEFFS_BITS);
+
/* Ensure that for GET_MODE_INNER (m) != m we have
also the inner mode marked. */
for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
diff --git a/gcc/poly-int.h b/gcc/poly-int.h
index e3f8d4df716..94708165961 100644
--- a/gcc/poly-int.h
+++ b/gcc/poly-int.h
@@ -354,6 +354,10 @@ struct poly_result<T1, T2, 2>
? (void) ((RES).coeffs[I] = VALUE) \
: (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
+/* Number of bits needed to represent maximum value of
+ NUM_POLY_INT_COEFFS defined by any target. */
+#define MAX_NUM_POLY_INT_COEFFS_BITS 2
+
/* poly_int_full and poly_int_hungry are used internally within poly_int
for delegated initializers. poly_int_full indicates that a parameter
pack has enough elements to initialize every coefficient. poly_int_hungry
diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
index c248a74f7a1..62200ef2d9d 100644
--- a/gcc/tree-streamer-in.cc
+++ b/gcc/tree-streamer-in.cc
@@ -671,8 +671,35 @@ static void
lto_input_ts_poly_tree_pointers (class lto_input_block *ib,
class data_in *data_in, tree expr)
{
- for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
- POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+#ifndef ACCEL_COMPILER
+ host_num_poly_int_coeffs = NUM_POLY_INT_COEFFS;
+#endif
+
+ /* Ensure that we have streamed-in host_num_poly_int_coeffs. */
+ gcc_assert (host_num_poly_int_coeffs > 0);
+ unsigned i;
+ if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
+ {
+ for (i = 0; i < host_num_poly_int_coeffs; i++)
+ POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+
+ tree coeff_type = TREE_TYPE (POLY_INT_CST_COEFF (expr, 0));
+ for (; i < NUM_POLY_INT_COEFFS; i++)
+ POLY_INT_CST_COEFF (expr, i) = build_zero_cst (coeff_type);
+ }
+ else
+ {
+ for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
+ POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+ for (; i < host_num_poly_int_coeffs; i++)
+ {
+ tree val = stream_read_tree_ref (ib, data_in);
+ if (!integer_zerop (val))
+ fatal_error (input_location,
+ "degree of %<poly_int%> exceeds "
+ "%<NUM_POLY_INT_COEFFS%>");
+ }
+ }
}