On 23/02/2018 11:58, Petri Latvala wrote:
On Fri, Feb 23, 2018 at 11:34:53AM +0000, Tvrtko Ursulin wrote:
From: Chris Wilson <ch...@chris-wilson.co.uk>

CPU hotplug, especially CPU0, can be flaky on commodity hardware.

To improve test reliability and reponse times when testing larger runs we
need to handle those cases better.

Handle failures to off-line a CPU by immediately skipping the test, and
failures to on-line a CPU by immediately rebooting the machine.

This patch includes igt_sysrq_reboot implementation from Chris Wilson.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
Cc: Chris Wilson <ch...@chris-wilson.co.uk>
---
  lib/Makefile.sources |  2 ++
  lib/igt_sysrq.c      | 22 ++++++++++++++++++++++
  lib/igt_sysrq.h      | 30 ++++++++++++++++++++++++++++++
  lib/meson.build      |  2 ++
  tests/perf_pmu.c     | 42 ++++++++++++++++++++++++++++++++++--------
  5 files changed, 90 insertions(+), 8 deletions(-)
  create mode 100644 lib/igt_sysrq.c
  create mode 100644 lib/igt_sysrq.h

diff --git a/lib/Makefile.sources b/lib/Makefile.sources
index 5b13ef8896c0..3d37ef1d1984 100644
--- a/lib/Makefile.sources
+++ b/lib/Makefile.sources
@@ -35,6 +35,8 @@ lib_source_list =             \
        igt_stats.h             \
        igt_sysfs.c             \
        igt_sysfs.h             \
+       igt_sysrq.c             \
+       igt_sysrq.h             \
        igt_x86.h               \
        igt_x86.c               \
        igt_vgem.c              \
diff --git a/lib/igt_sysrq.c b/lib/igt_sysrq.c
new file mode 100644
index 000000000000..fe3d2e344ff1
--- /dev/null
+++ b/lib/igt_sysrq.c
@@ -0,0 +1,22 @@
+#include <unistd.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <sys/reboot.h>
+
+#include "igt_core.h"
+
+#include "igt_sysrq.h"
+
+void igt_sysrq_reboot(void)
+{
+       sync();
+
+       /* Try to be nice at first, and if that fails pull the trigger */
+       if (reboot(RB_AUTOBOOT)) {
+               int fd = open("/proc/sysrq-trigger", O_WRONLY);
+               igt_ignore_warn(write(fd, "b", 2));
+               close(fd);
+       }
+
+       abort();
+}


While the cause for taking this action might be dire, rebooting
people's machines can be kind of a dick move, even considering they're
running tests that can be fatal to the machine in other ways.

We have IGT_HANG and IGT_HANG_WITHOUT_RESET so the users can opt
in/out of some fatal behaviour already. I'm fine with auto-rebooting,
even as the default, if users can opt out of it with
IGT_NO_REBOOT_PRETTY_PLEASE or so.

I am fine with something like that. Just lets define how to call the env variable and what the default should be?

Do we have a return code from a test which stops the test runner?

I am thinking that the best approach would be not to reboot but to halt testing, unless this environment option is set.

But then it is up to CI people to say if they want to be setting this option across all systems, or would actually prefer to reboot by default.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to