On 1/21/21 3:01 PM, Jan Hubicka wrote:

Plus I'm planning to send one more patch that will ignore time profile when 
-fprofile-reproduce != serial.

Why you need to disable time profiling?

Because you can have 2 training runs (running in parallel) when order is:
runA: foo -> bar
runB: bar -> foo

Then based on order of profile merging you get a final output.

I would like to address it with the attached patch.

Martin


Honza


>From fb4bc6f4b4b106d38fbf710f87e128d26fc1b988 Mon Sep 17 00:00:00 2001
From: Martin Liska <mli...@suse.cz>
Date: Thu, 21 Jan 2021 09:22:45 +0100
Subject: [PATCH 2/2] Consider time profilers only when
 -fprofile-reproducible=serial.

gcc/ChangeLog:

	PR gcov-profile/98739
	* cgraphunit.c (expand_all_functions): Consider tp_first_run
	only when -fprofile-reproducible=serial.

gcc/lto/ChangeLog:

	PR gcov-profile/98739
	* lto-partition.c (lto_balanced_map): Consider tp_first_run
	only when -fprofile-reproducible=serial.
---
 gcc/cgraphunit.c        | 5 +++--
 gcc/lto/lto-partition.c | 3 ++-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index b401f0817a3..042c03d819e 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1961,8 +1961,9 @@ expand_all_functions (void)
       }
 
   /* First output functions with time profile in specified order.  */
-  qsort (tp_first_run_order, tp_first_run_order_pos,
-	 sizeof (cgraph_node *), tp_first_run_node_cmp);
+  if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL)
+    qsort (tp_first_run_order, tp_first_run_order_pos,
+	   sizeof (cgraph_node *), tp_first_run_node_cmp);
   for (i = 0; i < tp_first_run_order_pos; i++)
     {
       node = tp_first_run_order[i];
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 15761ac9eb5..f9e632776e6 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -509,7 +509,8 @@ lto_balanced_map (int n_lto_partitions, int max_partition_size)
      unit tends to import a lot of global trees defined there.  We should
      get better about minimizing the function bounday, but until that
      things works smoother if we order in source order.  */
-  order.qsort (tp_first_run_node_cmp);
+  if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL)
+    order.qsort (tp_first_run_node_cmp);
   noreorder.qsort (node_cmp);
 
   if (dump_file)
-- 
2.30.0

Reply via email to