On 1/21/21 3:01 PM, Jan Hubicka wrote:
Plus I'm planning to send one more patch that will ignore time profile when
-fprofile-reproduce != serial.
Why you need to disable time profiling?
Because you can have 2 training runs (running in parallel) when order is:
runA: foo -> bar
runB: bar -> foo
Then based on order of profile merging you get a final output.
I would like to address it with the attached patch.
Martin
Honza
>From fb4bc6f4b4b106d38fbf710f87e128d26fc1b988 Mon Sep 17 00:00:00 2001
From: Martin Liska <mli...@suse.cz>
Date: Thu, 21 Jan 2021 09:22:45 +0100
Subject: [PATCH 2/2] Consider time profilers only when
-fprofile-reproducible=serial.
gcc/ChangeLog:
PR gcov-profile/98739
* cgraphunit.c (expand_all_functions): Consider tp_first_run
only when -fprofile-reproducible=serial.
gcc/lto/ChangeLog:
PR gcov-profile/98739
* lto-partition.c (lto_balanced_map): Consider tp_first_run
only when -fprofile-reproducible=serial.
---
gcc/cgraphunit.c | 5 +++--
gcc/lto/lto-partition.c | 3 ++-
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index b401f0817a3..042c03d819e 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1961,8 +1961,9 @@ expand_all_functions (void)
}
/* First output functions with time profile in specified order. */
- qsort (tp_first_run_order, tp_first_run_order_pos,
- sizeof (cgraph_node *), tp_first_run_node_cmp);
+ if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL)
+ qsort (tp_first_run_order, tp_first_run_order_pos,
+ sizeof (cgraph_node *), tp_first_run_node_cmp);
for (i = 0; i < tp_first_run_order_pos; i++)
{
node = tp_first_run_order[i];
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 15761ac9eb5..f9e632776e6 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -509,7 +509,8 @@ lto_balanced_map (int n_lto_partitions, int max_partition_size)
unit tends to import a lot of global trees defined there. We should
get better about minimizing the function bounday, but until that
things works smoother if we order in source order. */
- order.qsort (tp_first_run_node_cmp);
+ if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL)
+ order.qsort (tp_first_run_node_cmp);
noreorder.qsort (node_cmp);
if (dump_file)
--
2.30.0