> -----Original Message-----
> From: Sam James <mailto:s...@gentoo.org>
> Sent: Friday, April 4, 2025 14:28
> To: Robert Dubner <mailto:rdub...@symas.com>
> Cc: 'GCC Patches' <mailto:gcc-patches@gcc.gnu.org>
> Subject: Re: [committed] cobol: Eliminate cobolworx UAT errors when
> compiling with -Os
> 
> Robert Dubner <mailto:rdub...@symas.com> writes:
> 
> > From e70fe5ed46ab129a8b1da961c47d3fb75b11b988 Mon Sep 17 00:00:00 2001
> > From: Bob Dubner mailto:rdub...@symas.com
> > Date: Fri, 4 Apr 2025 13:48:58 -0400
> > Subject: [PATCH] cobol: Eliminate cobolworx UAT errors when compiling
> with
> > -Os
> >
> > Testcases compiled with -Os were failing because static functions and
> > static
> > variables were being optimized away, because of improper data type
> casts,
> > and
> > because strict aliasing (whatever that is) was resulting in some loss
> > of
> 
> Are you unfamiliar with that from C and C++? See
> https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8 (we all
> have our favourite documents to explain it) but I don't know if COBOL is
> amenable to the concept of TBAA.
> 
> > data.
> > These changes eliminate those known problems.
> 
> I'd suggest that this should be accompanied by some question, otherwise
> it's going to live there forever and it's not necessarily right (though
> see above - if COBOL is incompatible with the idea, it might need
> something along those lines, though not sure this is the right way of
> doing that).
> 
> That is, while you're free to approve your own COBOL patches, you're
> also free to CC others and ask them for advice even if not explicit
> approval before pushing them if something doesn't seem to be correct or
> is a hack.

I am not any kind of compiler expert, except, possibly in one way: I don't
trust them.  I have never trusted them.  I don't trust them, and I don't
trust the computers they run on.  I regard compilers the same way a
lion-tamer regards the big cats they work with every day.  I treat them
with firm respect; I expect them to behave the way they've been trained --
and I never turn my back on them.  So, no, I don't understand TBAA; I had
to look it up.  I don't understand aliasing.

The problem at hand is not a COBOL problem.  It is a Bob problem.  Richard
pointed me at the original root of the thing.  I believe I have addressed
that.  But the problem has not gone away.  The "flag_strict_aliasing = 0;"
solution makes the symptoms go away.  I spent a couple of hours messing
with this, and I was unable to resolve it.  So I have shrugged and used
what I have.

You probably have figured out by now that when I need help, it's because
1) I have missed something stupid, 2) I was just plain ignorant, or 3)
It's a hard problem.  

And because it's a hard problem, it's hard to describe.  But you sort of
asked, so I will sort of try to explain what's going on.

IBM-flavored COBOL has the concept of a RETURN-CODE, a global 16-bit
integer that is shared by all PROGRAM-ID modules.  (Each is implemented as
a C type function.)

In GCOBOL, a COBOL variable is actually a structure with a data area,
because COBOL variables have lots of metadata associated with them,
including the type of storage, the amount of storage, in many cases the
output format of the variable, whether or not they are signed, offsets
from parent, number of digits, number of decimal places, and more.  To
make life easier, we implement RETURN-CODE as a COBOL variable, but its
data area is a global 16-bit "short" defined in the libgcobol.so.

So:  Here we go:  

In libgcobol/constants.cc: 
        short __gg__data_return_code = 0;

In libgcobol/charmaps.h:
extern short         __gg__data_return_code    ;

In gcc/cobol/genapi.cc, I need to be able to generate GENERIC that
accesses that variable.

So, in gcc/genutil.h:
     extern tree var_decl_return_code; // short __gg__data_return_code

In gcc/cobol/genutil.cc:
     tree var_decl_return_code; // short __gg__data_return_code

In gcc/cobol/genapicc:
     SET_VAR_DECL(var_decl_return_code, SHORT, "__gg__data_return_code");

That macro results in 
     var_decl_return_code = gg_declare_variable(
          SHORT, 
          "__gg__data_return_code",
          NULL_TREE, // Initial value
          vs_external_reference)

SHORT is defined in gcc/cobol/gengen.h 
     #define SHORT      short_integer_type_node

The gg_declare_variable() function is in gengen.cc.  It creates a var_decl
for a SHORT with the name "__gg__data_return_code", and these attributes:

      DECL_CONTEXT (var_decl) = gg_trans_unit.trans_unit_decl;
      TREE_USED(var_decl)   = 1;
      DECL_EXTERNAL (var_decl) = 1;
      TREE_PUBLIC(var_decl) = 1;

With that all in place, in genapi.cc we turn our attention to the
parser_see_stop_run() routine.  That routine uses a static integer named
"..pssr_retval":

static tree returned_value = gg_define_variable(INT, "..pssr_retval",
vs_file_static);

That variable gets set thusly:

    gg_assign(returned_value, gg_cast(INT, var_decl_return_code));

(gg_assign() creates a MODIFY_EXPR, gg_cast does "return
fold_convert(type, var);"

and that variable gets used like this:

  gg_exit(returned_value);

gg_exit is found in gengen.cc:

     void
     gg_exit(tree exit_code)
       {
       tree the_call =
           build_call_expr_loc(location_from_lineno(),
                               builtin_decl_explicit (BUILT_IN_EXIT),
                               1,
                               exit_code);
       gg_append_statement(the_call);
       }

So, we have the stage set. When the prior version of GCOBOL (without the
flag_strict_aliasing=0) compiles this program

        PROGRAM-ID.      PROG.
        PROCEDURE        DIVISION.
        MOVE 1 TO RETURN-CODE
        STOP RUN.

with -O0 or -O1, the executable terminates with 1.

With -O2, -O3, or -Os, the executable terminates with 0.

It was at this point that I gave up.  As far as I can tell I am creating a
signed short, I am picking it up, I am casting it to a signed int, and I
am passing that to exit().  

I don't know what else to try.  

With flag_strict_aliasing=0, it works.  Without it, it doesn't.

I dislike needing that flag_strict_aliasing=0; it feels a bit like turning
my back to a tiger.  But here we are.

Remember: You asked. And thank you for asking.  And if anybody can offer
any insight as to what's going on here, it will be received with great
enthusiasm.


> 
> >
> > gcc/cobol
> >
> >     * cobol1.cc: (cobol_langhook_post_options): Implemented in order
> > to set
> >     flag_strict_aliasing to zero.
> >     * genapi.cc: (set_user_status): Add comment.
> >     (parser_intrinsic_subst): Expand SHOW_PARSE information.
> >     (psa_global): Change names of return-code and upsi globals,
> >     (psa_FldLiteralA): Set DECL_PRESERVE_P for FldLiteralA.
> >     * gengen.cc: (show_type): Add POINTER type.
> >     (gg_define_function_with_no_parameters): Set DECL_PRESERVE_P for
> > COBOL-
> >     style nested programs.  (gg_array_of_bytes): Fix bad cast.
> >
> > libgcobol
> >
> >     * charmaps.h: Change __gg__data_return_code to 'short' type.
> >     * constants.cc: Likewise.
> > ---
> >  gcc/cobol/cobol1.cc    | 19 +++++++++++++++++++
> >  gcc/cobol/genapi.cc    | 19 +++++++++++++++++--
> >  gcc/cobol/gengen.cc    | 12 ++++++++++--
> >  libgcobol/charmaps.h   |  2 +-
> >  libgcobol/constants.cc | 10 +++++-----
> >  5 files changed, 52 insertions(+), 10 deletions(-)
> >
> > diff --git a/gcc/cobol/cobol1.cc b/gcc/cobol/cobol1.cc
> > index 0d07c460d41..d175ab11e3f 100644
> > --- a/gcc/cobol/cobol1.cc
> > +++ b/gcc/cobol/cobol1.cc
> > @@ -646,6 +646,22 @@ cobol_get_sarif_source_language(const char *)
> >      return "cobol";
> >      }
> >
> > +bool
> > +cobol_langhook_post_options(const char**)
> > +  {
> > +  // This flag, when set to 0, results in calls to gg_exit working
> > properly.
> 
> Comments like this usually have a ??? or similar.
> 
> > +  // I don't know why it is necessary.  There is something going on
> with
> > the
> > +  // definition of  __gg__data_return_code in constants.cc, and with
> how
> > it
> > +  // is used through var_decl_return_code in genapi.cc.  Without it,
> the
> > value
> > +  // delivered to exit@PLT is zero, and not __gg__data_return_code
> > +  // Dubner, 2025-04-04.
> > +  flag_strict_aliasing = 0;
> > +
> > +  /* Returning false means that the backend should be used.  */
> > +  return false;
> > +  }
> > +
> > +
> >  #undef LANG_HOOKS_BUILTIN_FUNCTION
> >  #undef LANG_HOOKS_GETDECLS
> >  #undef LANG_HOOKS_GLOBAL_BINDINGS_P
> > @@ -660,6 +676,7 @@ cobol_get_sarif_source_language(const char *)
> >  ////#undef LANG_HOOKS_TYPE_FOR_SIZE
> >  #undef LANG_HOOKS_SET_DECL_ASSEMBLER_NAME
> >  #undef LANG_HOOKS_GET_SARIF_SOURCE_LANGUAGE
> > +#undef LANG_HOOKS_POST_OPTIONS
> >
> >  // We use GCC in the name, not GNU, as others do,
> >  // because "GnuCOBOL" refers to a different GNU project.
> > @@ -685,6 +702,8 @@ cobol_get_sarif_source_language(const char *)
> >
> >  #define LANG_HOOKS_GET_SARIF_SOURCE_LANGUAGE
> > cobol_get_sarif_source_language
> >
> > +#define LANG_HOOKS_POST_OPTIONS cobol_langhook_post_options
> > +
> >  struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
> >
> >  #include "gt-cobol-cobol1.h"
> > diff --git a/gcc/cobol/genapi.cc b/gcc/cobol/genapi.cc
> > index a0da6476e2a..fbe0bbc75dc 100644
> > --- a/gcc/cobol/genapi.cc
> > +++ b/gcc/cobol/genapi.cc
> > @@ -8806,6 +8806,10 @@ static
> >  void set_user_status(struct cbl_file_t *file)
> >    {
> >    // This routine sets the user_status, if any, to the
> > cblc_file_t::status
> > +
> > +  // We have to do it this way, because in the case where the
> > file->user_status
> > +  // is in linkage, the memory addresses can end up pointing to the
> wrong
> > +  // places
> >    if(file->user_status)
> >      {
> >      cbl_field_t *user_status =
> > cbl_field_of(symbol_at(file->user_status));
> > @@ -10111,6 +10115,13 @@ parser_intrinsic_subst( cbl_field_t *f,
> >    SHOW_PARSE
> >      {
> >      SHOW_PARSE_HEADER
> > +    SHOW_PARSE_FIELD(" TO ", f)
> > +    for(size_t i=0; i<argc; i++)
> > +      {
> > +      SHOW_PARSE_INDENT
> > +      SHOW_PARSE_FIELD(" ", argv[i].orig.field)
> > +      SHOW_PARSE_FIELD(" ", argv[i].replacement.field)
> > +      }
> >      SHOW_PARSE_END
> >      }
> >    TRACE1
> > @@ -15908,12 +15919,12 @@ psa_global(cbl_field_t *new_var)
> >
> >    if( strcmp(new_var->name, "RETURN-CODE") == 0 )
> >      {
> > -    strcpy(ach, "__gg___11_return_code6");
> > +    strcpy(ach, "__gg__return_code");
> >      }
> >
> >    if( strcmp(new_var->name, "UPSI-0") == 0 )
> >      {
> > -    strcpy(ach, "__gg___6_upsi_04");
> > +    strcpy(ach, "__gg__upsi");
> >      }
> >
> >    new_var->var_decl_node = gg_declare_variable(cblc_field_type_node,
> ach,
> > NULL, vs_external_reference);
> > @@ -16156,6 +16167,10 @@ psa_FldLiteralA(struct cbl_field_t *field )
> >                  field->data.initial,
> >                  NULL_TREE,
> >                  field->var_decl_node);
> > +    TREE_READONLY(field->var_decl_node) = 1;
> > +    TREE_USED(field->var_decl_node) = 1;
> > +    TREE_STATIC(field->var_decl_node) = 1;
> > +    DECL_PRESERVE_P (field->var_decl_node) = 1;
> >      nvar += 1;
> >      }
> >    TRACE1
> > diff --git a/gcc/cobol/gengen.cc b/gcc/cobol/gengen.cc
> > index ffb64c8993d..e7a4e3c5165 100644
> > --- a/gcc/cobol/gengen.cc
> > +++ b/gcc/cobol/gengen.cc
> > @@ -375,6 +375,10 @@ show_type(tree type)
> >    static char ach[1024];
> >    switch( TREE_CODE(type) )
> >      {
> > +    case POINTER_TYPE:
> > +      sprintf(ach, "POINTER");
> > +      break;
> > +
> >      case VOID_TYPE:
> >        sprintf(ach, "VOID");
> >        break;
> > @@ -2548,6 +2552,10 @@ gg_define_function_with_no_parameters(tree
> > return_type,
> >      DECL_CONTEXT (function_decl) = gg_trans_unit.trans_unit_decl;
> >      TREE_PUBLIC(function_decl) = 0;
> >
> > +    // This function is file static, but nobody calls it, so without
> > +    // intervention -O1+ optimizations will discard it.
> > +    DECL_PRESERVE_P (function_decl) = 1;
> > +
> >      // Append this function to the list of functions and variables
> >      // associated with the computation module.
> >      gg_append_var_decl(function_decl);
> > @@ -3358,8 +3366,8 @@ gg_array_of_size_t( size_t N, size_t *values)
> >  tree
> >  gg_array_of_bytes( size_t N, unsigned char *values)
> >    {
> > -  tree retval = gg_define_variable(build_pointer_type(UCHAR));
> > -  gg_assign(retval, gg_cast(build_pointer_type(UCHAR), gg_malloc(
> > build_int_cst_type(UCHAR, N * sizeof(unsigned char)))));
> > +  tree retval = gg_define_variable(UCHAR_P);
> > +  gg_assign(retval, gg_cast(UCHAR_P, gg_malloc(
> > build_int_cst_type(SIZE_T, N * sizeof(unsigned char)))));
> >    for(size_t i=0; i<N; i++)
> >      {
> >      gg_assign(gg_array_value(retval, i), build_int_cst_type(UCHAR,
> > values[i]));
> > diff --git a/libgcobol/charmaps.h b/libgcobol/charmaps.h
> > index 12968fdf928..6b4e9f5c4b4 100644
> > --- a/libgcobol/charmaps.h
> > +++ b/libgcobol/charmaps.h
> > @@ -297,7 +297,7 @@ extern unsigned char __gg__data_zeros[1]       ;
> >  extern unsigned char __gg__data_high_values[1] ;
> >  extern unsigned char __gg__data_quotes[1]      ;
> >  extern unsigned char __gg__data_upsi_0[2]      ;
> > -extern unsigned char __gg__data_return_code[2] ;
> > +extern short         __gg__data_return_code    ;
> >
> >  // These are the various hardcoded tables used for conversions.
> >  extern const unsigned short __gg__one_to_one_values[256];
> > diff --git a/libgcobol/constants.cc b/libgcobol/constants.cc
> > index 026f919cacc..d37c791f1b3 100644
> > --- a/libgcobol/constants.cc
> > +++ b/libgcobol/constants.cc
> > @@ -288,7 +288,7 @@ struct cblc_field_t __gg___14_linage_counter6 = {
> >
> >
> >  unsigned char __gg__data_upsi_0[2] = {0,0};
> > -struct cblc_field_t __gg___6_upsi_04 = {
> > +struct cblc_field_t __gg__upsi = {
> >    .data           = __gg__data_upsi_0 ,
> >    .capacity       = 2 ,
> >    .allocated      = 2 ,
> > @@ -307,9 +307,9 @@ struct cblc_field_t __gg___6_upsi_04 = {
> >    .dummy          = 0 ,
> >    };
> >
> > -unsigned char __gg__data_return_code[2] = {0,0};
> > -struct cblc_field_t __gg___11_return_code6 = {
> > -  .data           = __gg__data_return_code ,
> > +short __gg__data_return_code = 0;
> > +struct cblc_field_t __gg__return_code = {
> > +  .data           = (unsigned char *)&__gg__data_return_code ,
> >    .capacity       = 2 ,
> >    .allocated      = 2 ,
> >    .offset         = 0 ,
> > @@ -319,7 +319,7 @@ struct cblc_field_t __gg___11_return_code6 = {
> >    .parent         = NULL,
> >    .occurs_lower   = 0 ,
> >    .occurs_upper   = 0 ,
> > -  .attr           = 0x0 ,
> > +  .attr           = signable_e ,
> >    .type           = FldNumericBin5 ,
> >    .level          = 0 ,
> >    .digits         = 4 ,

Reply via email to