Anybody who might have gotten interested should stand down.

As usual, that analysis got me thinking.

I got focused on where var_decl_return_code was being used.  (I was wrong.
I made the mistake because I had just eliminated two sets of errors caused
by the optimization actually optimizing away things I need, so I had that
in the front of my brain.)  Richard told me of something odd at the point
where var_decl_return was being established.  I finally decided to look at
that.

Turned out, ultimately, to be a SHORT / USHORT mismatch on the variables
being given to a MODIFY_EXPR.  Apparently the optimization algorithms can
be extremely cranky about value types.

In any event, with that straightened out, everything is working without
the flag_strict_aliasing modification.

Thanks for asking, and thanks for listening.

> -----Original Message-----
> From: Robert Dubner <rdub...@symas.com>
> Sent: Friday, April 4, 2025 16:02
> To: Sam James <s...@gentoo.org>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> Subject: RE: [committed] cobol: Eliminate cobolworx UAT errors when
> compiling with -Os
> 
> > -----Original Message-----
> > From: Sam James <mailto:s...@gentoo.org>
> > Sent: Friday, April 4, 2025 14:28
> > To: Robert Dubner <mailto:rdub...@symas.com>
> > Cc: 'GCC Patches' <mailto:gcc-patches@gcc.gnu.org>
> > Subject: Re: [committed] cobol: Eliminate cobolworx UAT errors when
> > compiling with -Os
> >
> > Robert Dubner <mailto:rdub...@symas.com> writes:
> >
> > > From e70fe5ed46ab129a8b1da961c47d3fb75b11b988 Mon Sep 17 00:00:00
2001
> > > From: Bob Dubner mailto:rdub...@symas.com
> > > Date: Fri, 4 Apr 2025 13:48:58 -0400
> > > Subject: [PATCH] cobol: Eliminate cobolworx UAT errors when
compiling
> > with
> > > -Os
> > >
> > > Testcases compiled with -Os were failing because static functions
and
> > > static
> > > variables were being optimized away, because of improper data type
> > casts,
> > > and
> > > because strict aliasing (whatever that is) was resulting in some
loss
> > > of
> >
> > Are you unfamiliar with that from C and C++? See
> > https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8 (we
all
> > have our favourite documents to explain it) but I don't know if COBOL
is
> > amenable to the concept of TBAA.
> >
> > > data.
> > > These changes eliminate those known problems.
> >
> > I'd suggest that this should be accompanied by some question,
otherwise
> > it's going to live there forever and it's not necessarily right
(though
> > see above - if COBOL is incompatible with the idea, it might need
> > something along those lines, though not sure this is the right way of
> > doing that).
> >
> > That is, while you're free to approve your own COBOL patches, you're
> > also free to CC others and ask them for advice even if not explicit
> > approval before pushing them if something doesn't seem to be correct
or
> > is a hack.
> 
> I am not any kind of compiler expert, except, possibly in one way: I
don't
> trust them.  I have never trusted them.  I don't trust them, and I don't
> trust the computers they run on.  I regard compilers the same way a
> lion-tamer regards the big cats they work with every day.  I treat them
> with firm respect; I expect them to behave the way they've been trained
--
> and I never turn my back on them.  So, no, I don't understand TBAA; I
had
> to look it up.  I don't understand aliasing.
> 
> The problem at hand is not a COBOL problem.  It is a Bob problem.
Richard
> pointed me at the original root of the thing.  I believe I have
addressed
> that.  But the problem has not gone away.  The "flag_strict_aliasing =
0;"
> solution makes the symptoms go away.  I spent a couple of hours messing
> with this, and I was unable to resolve it.  So I have shrugged and used
> what I have.
> 
> You probably have figured out by now that when I need help, it's because
> 1) I have missed something stupid, 2) I was just plain ignorant, or 3)
> It's a hard problem.
> 
> And because it's a hard problem, it's hard to describe.  But you sort of
> asked, so I will sort of try to explain what's going on.
> 
> IBM-flavored COBOL has the concept of a RETURN-CODE, a global 16-bit
> integer that is shared by all PROGRAM-ID modules.  (Each is implemented
as
> a C type function.)
> 
> In GCOBOL, a COBOL variable is actually a structure with a data area,
> because COBOL variables have lots of metadata associated with them,
> including the type of storage, the amount of storage, in many cases the
> output format of the variable, whether or not they are signed, offsets
> from parent, number of digits, number of decimal places, and more.  To
> make life easier, we implement RETURN-CODE as a COBOL variable, but its
> data area is a global 16-bit "short" defined in the libgcobol.so.
> 
> So:  Here we go:
> 
> In libgcobol/constants.cc:
>       short __gg__data_return_code = 0;
> 
> In libgcobol/charmaps.h:
> extern short         __gg__data_return_code    ;
> 
> In gcc/cobol/genapi.cc, I need to be able to generate GENERIC that
> accesses that variable.
> 
> So, in gcc/genutil.h:
>      extern tree var_decl_return_code; // short __gg__data_return_code
> 
> In gcc/cobol/genutil.cc:
>      tree var_decl_return_code; // short __gg__data_return_code
> 
> In gcc/cobol/genapicc:
>      SET_VAR_DECL(var_decl_return_code, SHORT,
"__gg__data_return_code");
> 
> That macro results in
>      var_decl_return_code = gg_declare_variable(
>           SHORT,
>           "__gg__data_return_code",
>           NULL_TREE, // Initial value
>           vs_external_reference)
> 
> SHORT is defined in gcc/cobol/gengen.h
>      #define SHORT      short_integer_type_node
> 
> The gg_declare_variable() function is in gengen.cc.  It creates a
var_decl
> for a SHORT with the name "__gg__data_return_code", and these
attributes:
> 
>       DECL_CONTEXT (var_decl) = gg_trans_unit.trans_unit_decl;
>       TREE_USED(var_decl)   = 1;
>       DECL_EXTERNAL (var_decl) = 1;
>       TREE_PUBLIC(var_decl) = 1;
> 
> With that all in place, in genapi.cc we turn our attention to the
> parser_see_stop_run() routine.  That routine uses a static integer named
> "..pssr_retval":
> 
> static tree returned_value = gg_define_variable(INT, "..pssr_retval",
> vs_file_static);
> 
> That variable gets set thusly:
> 
>     gg_assign(returned_value, gg_cast(INT, var_decl_return_code));
> 
> (gg_assign() creates a MODIFY_EXPR, gg_cast does "return
> fold_convert(type, var);"
> 
> and that variable gets used like this:
> 
>   gg_exit(returned_value);
> 
> gg_exit is found in gengen.cc:
> 
>      void
>      gg_exit(tree exit_code)
>        {
>        tree the_call =
>            build_call_expr_loc(location_from_lineno(),
>                                builtin_decl_explicit (BUILT_IN_EXIT),
>                                1,
>                                exit_code);
>        gg_append_statement(the_call);
>        }
> 
> So, we have the stage set. When the prior version of GCOBOL (without the
> flag_strict_aliasing=0) compiles this program
> 
>         PROGRAM-ID.      PROG.
>         PROCEDURE        DIVISION.
>         MOVE 1 TO RETURN-CODE
>         STOP RUN.
> 
> with -O0 or -O1, the executable terminates with 1.
> 
> With -O2, -O3, or -Os, the executable terminates with 0.
> 
> It was at this point that I gave up.  As far as I can tell I am creating
a
> signed short, I am picking it up, I am casting it to a signed int, and I
> am passing that to exit().
> 
> I don't know what else to try.
> 
> With flag_strict_aliasing=0, it works.  Without it, it doesn't.
> 
> I dislike needing that flag_strict_aliasing=0; it feels a bit like
turning
> my back to a tiger.  But here we are.
> 
> Remember: You asked. And thank you for asking.  And if anybody can offer
> any insight as to what's going on here, it will be received with great
> enthusiasm.
> 
> 
> >
> > >
> > > gcc/cobol
> > >
> > >   * cobol1.cc: (cobol_langhook_post_options): Implemented in order
> > > to set
> > >   flag_strict_aliasing to zero.
> > >   * genapi.cc: (set_user_status): Add comment.
> > >   (parser_intrinsic_subst): Expand SHOW_PARSE information.
> > >   (psa_global): Change names of return-code and upsi globals,
> > >   (psa_FldLiteralA): Set DECL_PRESERVE_P for FldLiteralA.
> > >   * gengen.cc: (show_type): Add POINTER type.
> > >   (gg_define_function_with_no_parameters): Set DECL_PRESERVE_P for
> > > COBOL-
> > >   style nested programs.  (gg_array_of_bytes): Fix bad cast.
> > >
> > > libgcobol
> > >
> > >   * charmaps.h: Change __gg__data_return_code to 'short' type.
> > >   * constants.cc: Likewise.
> > > ---
> > >  gcc/cobol/cobol1.cc    | 19 +++++++++++++++++++
> > >  gcc/cobol/genapi.cc    | 19 +++++++++++++++++--
> > >  gcc/cobol/gengen.cc    | 12 ++++++++++--
> > >  libgcobol/charmaps.h   |  2 +-
> > >  libgcobol/constants.cc | 10 +++++-----
> > >  5 files changed, 52 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/gcc/cobol/cobol1.cc b/gcc/cobol/cobol1.cc
> > > index 0d07c460d41..d175ab11e3f 100644
> > > --- a/gcc/cobol/cobol1.cc
> > > +++ b/gcc/cobol/cobol1.cc
> > > @@ -646,6 +646,22 @@ cobol_get_sarif_source_language(const char *)
> > >      return "cobol";
> > >      }
> > >
> > > +bool
> > > +cobol_langhook_post_options(const char**)
> > > +  {
> > > +  // This flag, when set to 0, results in calls to gg_exit working
> > > properly.
> >
> > Comments like this usually have a ??? or similar.
> >
> > > +  // I don't know why it is necessary.  There is something going on
> > with
> > > the
> > > +  // definition of  __gg__data_return_code in constants.cc, and
with
> > how
> > > it
> > > +  // is used through var_decl_return_code in genapi.cc.  Without
it,
> > the
> > > value
> > > +  // delivered to exit@PLT is zero, and not __gg__data_return_code
> > > +  // Dubner, 2025-04-04.
> > > +  flag_strict_aliasing = 0;
> > > +
> > > +  /* Returning false means that the backend should be used.  */
> > > +  return false;
> > > +  }
> > > +
> > > +
> > >  #undef LANG_HOOKS_BUILTIN_FUNCTION
> > >  #undef LANG_HOOKS_GETDECLS
> > >  #undef LANG_HOOKS_GLOBAL_BINDINGS_P
> > > @@ -660,6 +676,7 @@ cobol_get_sarif_source_language(const char *)
> > >  ////#undef LANG_HOOKS_TYPE_FOR_SIZE
> > >  #undef LANG_HOOKS_SET_DECL_ASSEMBLER_NAME
> > >  #undef LANG_HOOKS_GET_SARIF_SOURCE_LANGUAGE
> > > +#undef LANG_HOOKS_POST_OPTIONS
> > >
> > >  // We use GCC in the name, not GNU, as others do,
> > >  // because "GnuCOBOL" refers to a different GNU project.
> > > @@ -685,6 +702,8 @@ cobol_get_sarif_source_language(const char *)
> > >
> > >  #define LANG_HOOKS_GET_SARIF_SOURCE_LANGUAGE
> > > cobol_get_sarif_source_language
> > >
> > > +#define LANG_HOOKS_POST_OPTIONS cobol_langhook_post_options
> > > +
> > >  struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
> > >
> > >  #include "gt-cobol-cobol1.h"
> > > diff --git a/gcc/cobol/genapi.cc b/gcc/cobol/genapi.cc
> > > index a0da6476e2a..fbe0bbc75dc 100644
> > > --- a/gcc/cobol/genapi.cc
> > > +++ b/gcc/cobol/genapi.cc
> > > @@ -8806,6 +8806,10 @@ static
> > >  void set_user_status(struct cbl_file_t *file)
> > >    {
> > >    // This routine sets the user_status, if any, to the
> > > cblc_file_t::status
> > > +
> > > +  // We have to do it this way, because in the case where the
> > > file->user_status
> > > +  // is in linkage, the memory addresses can end up pointing to the
> > wrong
> > > +  // places
> > >    if(file->user_status)
> > >      {
> > >      cbl_field_t *user_status =
> > > cbl_field_of(symbol_at(file->user_status));
> > > @@ -10111,6 +10115,13 @@ parser_intrinsic_subst( cbl_field_t *f,
> > >    SHOW_PARSE
> > >      {
> > >      SHOW_PARSE_HEADER
> > > +    SHOW_PARSE_FIELD(" TO ", f)
> > > +    for(size_t i=0; i<argc; i++)
> > > +      {
> > > +      SHOW_PARSE_INDENT
> > > +      SHOW_PARSE_FIELD(" ", argv[i].orig.field)
> > > +      SHOW_PARSE_FIELD(" ", argv[i].replacement.field)
> > > +      }
> > >      SHOW_PARSE_END
> > >      }
> > >    TRACE1
> > > @@ -15908,12 +15919,12 @@ psa_global(cbl_field_t *new_var)
> > >
> > >    if( strcmp(new_var->name, "RETURN-CODE") == 0 )
> > >      {
> > > -    strcpy(ach, "__gg___11_return_code6");
> > > +    strcpy(ach, "__gg__return_code");
> > >      }
> > >
> > >    if( strcmp(new_var->name, "UPSI-0") == 0 )
> > >      {
> > > -    strcpy(ach, "__gg___6_upsi_04");
> > > +    strcpy(ach, "__gg__upsi");
> > >      }
> > >
> > >    new_var->var_decl_node =
gg_declare_variable(cblc_field_type_node,
> > ach,
> > > NULL, vs_external_reference);
> > > @@ -16156,6 +16167,10 @@ psa_FldLiteralA(struct cbl_field_t *field )
> > >                  field->data.initial,
> > >                  NULL_TREE,
> > >                  field->var_decl_node);
> > > +    TREE_READONLY(field->var_decl_node) = 1;
> > > +    TREE_USED(field->var_decl_node) = 1;
> > > +    TREE_STATIC(field->var_decl_node) = 1;
> > > +    DECL_PRESERVE_P (field->var_decl_node) = 1;
> > >      nvar += 1;
> > >      }
> > >    TRACE1
> > > diff --git a/gcc/cobol/gengen.cc b/gcc/cobol/gengen.cc
> > > index ffb64c8993d..e7a4e3c5165 100644
> > > --- a/gcc/cobol/gengen.cc
> > > +++ b/gcc/cobol/gengen.cc
> > > @@ -375,6 +375,10 @@ show_type(tree type)
> > >    static char ach[1024];
> > >    switch( TREE_CODE(type) )
> > >      {
> > > +    case POINTER_TYPE:
> > > +      sprintf(ach, "POINTER");
> > > +      break;
> > > +
> > >      case VOID_TYPE:
> > >        sprintf(ach, "VOID");
> > >        break;
> > > @@ -2548,6 +2552,10 @@ gg_define_function_with_no_parameters(tree
> > > return_type,
> > >      DECL_CONTEXT (function_decl) = gg_trans_unit.trans_unit_decl;
> > >      TREE_PUBLIC(function_decl) = 0;
> > >
> > > +    // This function is file static, but nobody calls it, so
without
> > > +    // intervention -O1+ optimizations will discard it.
> > > +    DECL_PRESERVE_P (function_decl) = 1;
> > > +
> > >      // Append this function to the list of functions and variables
> > >      // associated with the computation module.
> > >      gg_append_var_decl(function_decl);
> > > @@ -3358,8 +3366,8 @@ gg_array_of_size_t( size_t N, size_t *values)
> > >  tree
> > >  gg_array_of_bytes( size_t N, unsigned char *values)
> > >    {
> > > -  tree retval = gg_define_variable(build_pointer_type(UCHAR));
> > > -  gg_assign(retval, gg_cast(build_pointer_type(UCHAR), gg_malloc(
> > > build_int_cst_type(UCHAR, N * sizeof(unsigned char)))));
> > > +  tree retval = gg_define_variable(UCHAR_P);
> > > +  gg_assign(retval, gg_cast(UCHAR_P, gg_malloc(
> > > build_int_cst_type(SIZE_T, N * sizeof(unsigned char)))));
> > >    for(size_t i=0; i<N; i++)
> > >      {
> > >      gg_assign(gg_array_value(retval, i), build_int_cst_type(UCHAR,
> > > values[i]));
> > > diff --git a/libgcobol/charmaps.h b/libgcobol/charmaps.h
> > > index 12968fdf928..6b4e9f5c4b4 100644
> > > --- a/libgcobol/charmaps.h
> > > +++ b/libgcobol/charmaps.h
> > > @@ -297,7 +297,7 @@ extern unsigned char __gg__data_zeros[1]       ;
> > >  extern unsigned char __gg__data_high_values[1] ;
> > >  extern unsigned char __gg__data_quotes[1]      ;
> > >  extern unsigned char __gg__data_upsi_0[2]      ;
> > > -extern unsigned char __gg__data_return_code[2] ;
> > > +extern short         __gg__data_return_code    ;
> > >
> > >  // These are the various hardcoded tables used for conversions.
> > >  extern const unsigned short __gg__one_to_one_values[256];
> > > diff --git a/libgcobol/constants.cc b/libgcobol/constants.cc
> > > index 026f919cacc..d37c791f1b3 100644
> > > --- a/libgcobol/constants.cc
> > > +++ b/libgcobol/constants.cc
> > > @@ -288,7 +288,7 @@ struct cblc_field_t __gg___14_linage_counter6 =
{
> > >
> > >
> > >  unsigned char __gg__data_upsi_0[2] = {0,0};
> > > -struct cblc_field_t __gg___6_upsi_04 = {
> > > +struct cblc_field_t __gg__upsi = {
> > >    .data           = __gg__data_upsi_0 ,
> > >    .capacity       = 2 ,
> > >    .allocated      = 2 ,
> > > @@ -307,9 +307,9 @@ struct cblc_field_t __gg___6_upsi_04 = {
> > >    .dummy          = 0 ,
> > >    };
> > >
> > > -unsigned char __gg__data_return_code[2] = {0,0};
> > > -struct cblc_field_t __gg___11_return_code6 = {
> > > -  .data           = __gg__data_return_code ,
> > > +short __gg__data_return_code = 0;
> > > +struct cblc_field_t __gg__return_code = {
> > > +  .data           = (unsigned char *)&__gg__data_return_code ,
> > >    .capacity       = 2 ,
> > >    .allocated      = 2 ,
> > >    .offset         = 0 ,
> > > @@ -319,7 +319,7 @@ struct cblc_field_t __gg___11_return_code6 = {
> > >    .parent         = NULL,
> > >    .occurs_lower   = 0 ,
> > >    .occurs_upper   = 0 ,
> > > -  .attr           = 0x0 ,
> > > +  .attr           = signable_e ,
> > >    .type           = FldNumericBin5 ,
> > >    .level          = 0 ,
> > >    .digits         = 4 ,

Reply via email to