I'd like to ping the following patch for review: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614165.html
Thanks Dave On Fri, 2023-03-17 at 16:53 -0400, David Malcolm wrote: > PR other/109163 notes that when we write out JSON files, we traverse > the keys within each object via hash_map iteration, and thus the > ordering is non-deterministic - it can arbitrarily vary from run to > run and from different machines, making it harder for users to > compare > results and determine if anything has "really" changed. > > I'm running into this issue with SARIF output, but there are several > places where we're currently emitting JSON: > > * -fsave-optimization-record emits SRCFILE.opt-record.json.gz > "This option is experimental and the format of the data > within > the compressed JSON file is subject to change."; see > optinfo-emit-json.{h,cc}, dumpfile.cc, etc > * -fdiagnostics-format= with the various "sarif" and "json" options > * -fdump-analyzer-json is a developer option in the analyzer > * gcov has: > "-j, --json-format: Output JSON intermediate format into > .gcov.json.gz file" > > This patch adds an auto_vec to class json::object to preserve > key-insertion order, and use it when writing out objects. > Potentially > this slightly slows down JSON output, but I believe that this isn't > normally a bottleneck, and that the benefits to the user of > deterministic output are worth it. > > I had first attempted to use ordered_hash_map.h for this, but ran > into > impenetrable template errors, so this patch uses a simpler approach > of > just adding an auto_vec to json::object. > > Testing showed a failure of diagnostic-format-json-5.c, which was > using > a convoluted set of regexps to consume the output; I believe that > this > was brittle, and was intermittently failing for some of the random > orderings of output. I rewrote these regexps to work with the > expected > output order. The other such tests seem to pass with the > now-deterministic orderings. > > Lightly tested with valgrind. > I manually verified that the SARIF output is now deterministic. > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. > > OK for trunk? > > gcc/ChangeLog: > PR other/109163 > * json.cc: Update comments to indicate that we now preserve > insertion order of keys within objects. > (object::print): Traverse keys in insertion order. > (object::set): Preserve insertion order of keys. > (selftest::test_writing_objects): Add an additional key to > verify > that we preserve insertion order. > * json.h (object::m_keys): New field. > > gcc/testsuite/ChangeLog: > PR other/109163 > * c-c++-common/diagnostic-format-json-1.c: Update comment. > * c-c++-common/diagnostic-format-json-2.c: Likewise. > * c-c++-common/diagnostic-format-json-3.c: Likewise. > * c-c++-common/diagnostic-format-json-4.c: Likewise. > * c-c++-common/diagnostic-format-json-5.c: Rewrite regexps. > * c-c++-common/diagnostic-format-json-stderr-1.c: Update > comment. > > Signed-off-by: David Malcolm <dmalc...@redhat.com> > --- > gcc/json.cc | 40 ++++--- > gcc/json.h | 10 +- > .../c-c++-common/diagnostic-format-json-1.c | 3 +- > .../c-c++-common/diagnostic-format-json-2.c | 3 +- > .../c-c++-common/diagnostic-format-json-3.c | 3 +- > .../c-c++-common/diagnostic-format-json-4.c | 3 +- > .../c-c++-common/diagnostic-format-json-5.c | 100 ++++++++++------ > -- > .../diagnostic-format-json-stderr-1.c | 3 +- > 8 files changed, 95 insertions(+), 70 deletions(-) > > diff --git a/gcc/json.cc b/gcc/json.cc > index 01879c9c466..741e97b20e5 100644 > --- a/gcc/json.cc > +++ b/gcc/json.cc > @@ -31,8 +31,11 @@ using namespace json; > /* class json::value. */ > > /* Dump this json::value tree to OUTF. > - No formatting is done. There are no guarantees about the order > - in which the key/value pairs of json::objects are printed. */ > + > + No formatting is done. > + > + The key/value pairs of json::objects are printed in the order > + in which the keys were originally inserted. */ > > void > value::dump (FILE *outf) const > @@ -44,7 +47,7 @@ value::dump (FILE *outf) const > } > > /* class json::object, a subclass of json::value, representing > - an unordered collection of key/value pairs. */ > + an ordered collection of key/value pairs. */ > > /* json:object's dtor. */ > > @@ -62,14 +65,17 @@ object::~object () > void > object::print (pretty_printer *pp) const > { > - /* Note that the order is not guaranteed. */ > pp_character (pp, '{'); > - for (map_t::iterator it = m_map.begin (); it != m_map.end (); > ++it) > + > + /* Iterate in the order that the keys were inserted. */ > + unsigned i; > + const char *key; > + FOR_EACH_VEC_ELT (m_keys, i, key) > { > - if (it != m_map.begin ()) > + if (i > 0) > pp_string (pp, ", "); > - const char *key = const_cast <char *>((*it).first); > - value *value = (*it).second; > + map_t &mut_map = const_cast<map_t &> (m_map); > + value *value = *mut_map.get (key); > pp_doublequote (pp); > pp_string (pp, key); // FIXME: escaping? > pp_doublequote (pp); > @@ -97,9 +103,13 @@ object::set (const char *key, value *v) > *ptr = v; > } > else > - /* If the key wasn't already present, take a copy of the key, > - and store the value. */ > - m_map.put (xstrdup (key), v); > + { > + /* If the key wasn't already present, take a copy of the key, > + and store the value. */ > + char *owned_key = xstrdup (key); > + m_map.put (owned_key, v); > + m_keys.safe_push (owned_key); > + } > } > > /* Get the json::value * for KEY. > @@ -295,15 +305,17 @@ test_object_get () > ASSERT_EQ (obj.get ("not-present"), NULL); > } > > -/* Verify that JSON objects are written correctly. We can't test > more than > - one key/value pair, as we don't impose a guaranteed ordering. */ > +/* Verify that JSON objects are written correctly. */ > > static void > test_writing_objects () > { > object obj; > obj.set ("foo", new json::string ("bar")); > - assert_print_eq (obj, "{\"foo\": \"bar\"}"); > + obj.set ("baz", new json::string ("quux")); > + /* This test relies on json::object writing out key/value pairs > + in key-insertion order. */ > + assert_print_eq (obj, "{\"foo\": \"bar\", \"baz\": \"quux\"}"); > } > > /* Verify that JSON arrays are written correctly. */ > diff --git a/gcc/json.h b/gcc/json.h > index aa52ba2951a..057119db277 100644 > --- a/gcc/json.h > +++ b/gcc/json.h > @@ -82,8 +82,11 @@ class value > void dump (FILE *) const; > }; > > -/* Subclass of value for objects: an unordered collection of > - key/value pairs. */ > +/* Subclass of value for objects: a collection of key/value pairs > + preserving the ordering in which keys were inserted. > + > + Preserving the order eliminates non-determinism in the output, > + making it easier for the user to compare repeated invocations. > */ > > class object : public value > { > @@ -100,6 +103,9 @@ class object : public value > typedef hash_map <char *, value *, > simple_hashmap_traits<nofree_string_hash, value *> > map_t; > map_t m_map; > + > + /* Keep track of order in which keys were inserted. */ > + auto_vec <const char *> m_keys; > }; > > /* Subclass of value for arrays. */ > diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c > b/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c > index af57eb636d5..6bab30e3e6c 100644 > --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c > +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c > @@ -4,8 +4,7 @@ > #error message > > /* Use dg-regexp to consume the JSON output starting with > - the innermost values, and working outwards. > - We can't rely on any ordering of the keys. */ > + the innermost values, and working outwards. */ > > /* { dg-regexp "\"kind\": \"error\"" } */ > /* { dg-regexp "\"column-origin\": 1" } */ > diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c > b/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c > index edb802efb8d..3c12103c9f8 100644 > --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c > +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c > @@ -4,8 +4,7 @@ > #warning message > > /* Use dg-regexp to consume the JSON output starting with > - the innermost values, and working outwards. > - We can't rely on any ordering of the keys. */ > + the innermost values, and working outwards. */ > > /* { dg-regexp "\"kind\": \"warning\"" } */ > /* { dg-regexp "\"column-origin\": 1" } */ > diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c > b/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c > index bb7b8dc5d16..11d74624ff1 100644 > --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c > +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c > @@ -4,8 +4,7 @@ > #warning message > > /* Use dg-regexp to consume the JSON output starting with > - the innermost values, and working outwards. > - We can't rely on any ordering of the keys. */ > + the innermost values, and working outwards. */ > > /* { dg-regexp "\"kind\": \"error\"" } */ > /* { dg-regexp "\"column-origin\": 1" } */ > diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c > b/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c > index 8ac90723cbd..cec1cf924b4 100644 > --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c > +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c > @@ -10,8 +10,7 @@ int test (void) > } > > /* Use dg-regexp to consume the JSON output starting with > - the innermost values, and working outwards. > - We can't rely on any ordering of the keys. */ > + the innermost values, and working outwards. */ > > /* Verify nested diagnostics. */ > > diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c > b/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c > index 8d2eb0c5089..86f8c5fb374 100644 > --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c > +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c > @@ -8,49 +8,61 @@ int test (struct s *ptr) > return ptr->colour; > } > > -/* Use dg-regexp to consume the JSON output starting with > - the innermost values, and working outwards. > - We can't rely on any ordering of the keys. */ > +/* Verify fix-it hints. > > -/* { dg-regexp "\"kind\": \"error\"" } */ > -/* { dg-regexp "\"column-origin\": 1" } */ > -/* { dg-regexp "\"escape-source\": false" } */ > -/* { dg-regexp "\"message\": \".*\"" } */ > + Use dg-regexp to consume the JSON output from start to > + finish, relying on the ordering of the keys. > + The following uses indentation to visualize the structure > + of the JSON (although the actual output is all on one line). > > -/* Verify fix-it hints. */ > - > -/* { dg-regexp "\"string\": \"color\"" } */ > - > -/* { dg-regexp "\"start\": \{" } */ > -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json- > 5.c\"" } */ > -/* { dg-regexp "\"line\": 8" } */ > -/* { dg-regexp "\"column\": 15" } */ > -/* { dg-regexp "\"display-column\": 15" } */ > -/* { dg-regexp "\"byte-column\": 15" } */ > - > -/* { dg-regexp "\"next\": \{" } */ > -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json- > 5.c\"" } */ > -/* { dg-regexp "\"line\": 8" } */ > -/* { dg-regexp "\"column\": 21" } */ > -/* { dg-regexp "\"display-column\": 21" } */ > -/* { dg-regexp "\"byte-column\": 21" } */ > - > -/* { dg-regexp "\"fixits\": \[\[\{\}, \]*\]" } */ > - > -/* { dg-regexp "\"caret\": \{" } */ > -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json- > 5.c\"" } */ > -/* { dg-regexp "\"line\": 8" } */ > -/* { dg-regexp "\"column\": 15" } */ > -/* { dg-regexp "\"display-column\": 15" } */ > -/* { dg-regexp "\"byte-column\": 15" } */ > - > -/* { dg-regexp "\"finish\": \{" } */ > -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json- > 5.c\"" } */ > -/* { dg-regexp "\"line\": 8" } */ > -/* { dg-regexp "\"column\": 20" } */ > -/* { dg-regexp "\"display-column\": 20" } */ > -/* { dg-regexp "\"byte-column\": 20" } */ > - > -/* { dg-regexp "\"locations\": \[\[\{\}, \]*\]" } */ > -/* { dg-regexp "\"children\": \[\[\]\[\]\]" } */ > -/* { dg-regexp "\[\[\{\}, \]*\]" } */ > + { dg-regexp {\[} } > + { dg-regexp {\{} } > + { dg-regexp {"kind": "error"} } > + { dg-regexp {, "message": "'struct s' has no member named > 'colour'; did you mean 'color'\?"} } > + { dg-regexp {, "children": \[\]} } > + { dg-regexp {, "column-origin": 1} } > + { dg-regexp {, "locations": } } > + { dg-regexp {\[} } > + { dg-regexp {\{} } > + { dg-regexp {"caret": } } > + { dg-regexp {\{} } > + { dg-regexp {"file": "[^\n\r"]*diagnostic-format- > json-5.c"} } > + { dg-regexp {, "line": 8} } > + { dg-regexp {, "display-column": 15} } > + { dg-regexp {, "byte-column": 15} } > + { dg-regexp {, "column": 15} } > + { dg-regexp {\}} } > + { dg-regexp {, "finish": } } > + { dg-regexp {\{} } > + { dg-regexp {"file": "[^\n\r"]*diagnostic-format- > json-5.c"} } > + { dg-regexp {, "line": 8} } > + { dg-regexp {, "display-column": 20} } > + { dg-regexp {, "byte-column": 20} } > + { dg-regexp {, "column": 20} } > + { dg-regexp {\}} } > + { dg-regexp {\}} } > + { dg-regexp {\]} } > + { dg-regexp {, "fixits": } } > + { dg-regexp {\[} } > + { dg-regexp {\{} } > + { dg-regexp {"start": } } > + { dg-regexp {\{} } > + { dg-regexp {"file": "[^\n\r"]*diagnostic-format- > json-5.c"} } > + { dg-regexp {, "line": 8} } > + { dg-regexp {, "display-column": 15} } > + { dg-regexp {, "byte-column": 15} } > + { dg-regexp {, "column": 15} } > + { dg-regexp {\}} } > + { dg-regexp {, "next": } } > + { dg-regexp {\{} } > + { dg-regexp {"file": "[^\n\r"]*diagnostic-format- > json-5.c"} } > + { dg-regexp {, "line": 8} } > + { dg-regexp {, "display-column": 21} } > + { dg-regexp {, "byte-column": 21} } > + { dg-regexp {, "column": 21} } > + { dg-regexp {\}} } > + { dg-regexp {, "string": "color"} } > + { dg-regexp {\}} } > + { dg-regexp {\]} } > + { dg-regexp {, "escape-source": false\}} } > + { dg-regexp {\]} } */ > diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json- > stderr-1.c b/gcc/testsuite/c-c++-common/diagnostic-format-json- > stderr-1.c > index 02f780bce10..bcfa92110f5 100644 > --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-stderr-1.c > +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-stderr-1.c > @@ -6,8 +6,7 @@ > #error message > > /* Use dg-regexp to consume the JSON output starting with > - the innermost values, and working outwards. > - We can't rely on any ordering of the keys. */ > + the innermost values, and working outwards. */ > > /* { dg-regexp "\"kind\": \"error\"" } */ > /* { dg-regexp "\"column-origin\": 1" } */