https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124531

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P3
                 CC|                            |mpolacek at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Slightly adjusted testcase:

char *
foo ()
{
  return new char[] {
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
    0, 1 };
}
or
char *
bar ()
{
  return new char[] {
#embed "something_larger_than_64_bytes.bin"
  };
}

The problem is that build_vec_init FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS
(init), idx, field, elt)
loop doesn't handle RAW_DATA_CST, but I'm afraid despite the
r15-7810-g173cf7c9b8c0d61
PR109431 change (CCing Marek) doesn't handle RANGE_EXPRs either (that change
made it count RANGE_EXPR in num_initialized_elts correctly, but hasn't changed
the actual code generation, so I'm afraid it will still store just one elt
instead of say 200 or whatever the range covers.  One would need to repeat it
or handle in a loop).

Now, obviously RAW_DATA_CST can be handled as a set of separate stores, ditto
the RANGE_EXPR (but that one can be also optimized into a loop).
In GCC 14, for the above first testcase we emit what is gimplified as:
  D.2794 = operator new [] (130);
  D.2798 = 1;
  try
    {
      D.2795 = D.2794;
      D.2796 = D.2795;
      D.2797 = 129;
      *D.2796 = 0;
      D.2796 = D.2796 + 1;
      D.2797 = D.2797 + -1;
      *D.2796 = 1;
      D.2796 = D.2796 + 1;
      D.2797 = D.2797 + -1;
      *D.2796 = 2;
      D.2796 = D.2796 + 1;
      D.2797 = D.2797 + -1;
      *D.2796 = 3;
      D.2796 = D.2796 + 1;
      D.2797 = D.2797 + -1;
      *D.2796 = 4;
...
      D.2796 = D.2796 + 1;
      D.2797 = D.2797 + -1;
      *D.2796 = 15;
      D.2796 = D.2796 + 1;
      D.2797 = D.2797 + -1;
      *D.2796 = 0;
      D.2796 = D.2796 + 1;
      D.2797 = D.2797 + -1;
      *D.2796 = 1;
      D.2796 = D.2796 + 1;
      D.2797 = D.2797 + -1;
      retval.0 = D.2795;
      D.2798 = 0;
      D.2799 = D.2794;
      return D.2799;
That is fine for something fairly small, but really inappropriate already for
the arrays of 130 elements like here or even worse if that would be say new
char [1000000000] with initializer with 1 billion constants.

The 130 elements case is actually not that bad, just annoying compile time and
memory wise, with -O2 we slp vectorize it into:
  _3 = operator new [] (130);
  vectp.5_134 = _3;
  MEM <vector(16) char> [(char *)vectp.5_134] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15 };
  vectp.5_136 = vectp.5_134 + 16;
  MEM <vector(16) char> [(char *)vectp.5_136] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15 };
  vectp.5_138 = vectp.5_134 + 32;
  MEM <vector(16) char> [(char *)vectp.5_138] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15 };
  vectp.5_140 = vectp.5_134 + 48;
  MEM <vector(16) char> [(char *)vectp.5_140] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15 };
  vectp.5_142 = vectp.5_134 + 64;
  MEM <vector(16) char> [(char *)vectp.5_142] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15 };
  vectp.5_144 = vectp.5_134 + 80;
  MEM <vector(16) char> [(char *)vectp.5_144] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15 };
  vectp.5_146 = vectp.5_134 + 96;
  MEM <vector(16) char> [(char *)vectp.5_146] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15 };
  vectp.5_148 = vectp.5_134 + 112;
  MEM <vector(16) char> [(char *)vectp.5_148] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15 };
  vectp.7_150 = _3 + 128;
  MEM <vector(2) char> [(char *)vectp.7_150] = { 0, 1 };
  return _3;
and with -O2 -fno-tree-vectorize at least store merging kicks in:
  _3 = operator new [] (130);
  MEM <unsigned long> [(char *)_3] = 506097522914230528;
  MEM <unsigned long> [(char *)_3 + 8B] = 1084818905618843912;
  MEM <unsigned long> [(char *)_3 + 16B] = 506097522914230528;
  MEM <unsigned long> [(char *)_3 + 24B] = 1084818905618843912;
  MEM <unsigned long> [(char *)_3 + 32B] = 506097522914230528;
  MEM <unsigned long> [(char *)_3 + 40B] = 1084818905618843912;
  MEM <unsigned long> [(char *)_3 + 48B] = 506097522914230528;
  MEM <unsigned long> [(char *)_3 + 56B] = 1084818905618843912;
  MEM <unsigned long> [(char *)_3 + 64B] = 506097522914230528;
  MEM <unsigned long> [(char *)_3 + 72B] = 1084818905618843912;
  MEM <unsigned long> [(char *)_3 + 80B] = 506097522914230528;
  MEM <unsigned long> [(char *)_3 + 88B] = 1084818905618843912;
  MEM <unsigned long> [(char *)_3 + 96B] = 506097522914230528;
  MEM <unsigned long> [(char *)_3 + 104B] = 1084818905618843912;
  MEM <unsigned long> [(char *)_3 + 112B] = 506097522914230528;
  MEM <unsigned long> [(char *)_3 + 120B] = 1084818905618843912;
  MEM <unsigned short> [(char *)_3 + 128B] = 256;
  return _3;

But IMHO for really large CONSTRUCTORs (the question is from which size) if
they have integral type elements and at least when the CONSTRUCTOR is
TREE_CONSTANT, I'd think we should just build MODIFY_EXPR of MEM_REF with
ARRAY_TYPE on lhs and the CONSTRUCTOR on RHS and let the gimplifier handle it. 
The gimplifier then can choose to put it e.g. into .rodata etc.

Reply via email to