More notekeeping: I added
udev_dbg(udev_monitor->udev, "udev_monitor_receive_device: start\n");
at the very beginning of udev_monitor_receive_device(). In theory, any
"start" of this should either be matched by a "success" at the end, or
an "unable to receive message: Resource temporarily unavailable". In a
successful test run this is true, but in a failed run I get 14 "start"s
and 3 "Resource temporarily unavailable", which leaves exactly 11
"starts" which ought to succeed. But I only get 9 successes, reflecting
the "actual: 9 expected: 11" failure.
Indeed I see two blocks
libudev: udev_monitor_receive_device: udev_monitor_receive_device: start
libudev: udev_device_new_from_syspath: device 0x7f2394001be0 has devpath
'/devices/card2'
libudev: udev_device_new_from_syspath: device 0x7f2394001be0 has devpath
'/devices/card2'
without either a success or a "unable to receive message", which means
that there is somewhere an exit of udev_monitor_receive_device() which
eats the event.
I now added udev_dbg()s to all exit paths which didn't yet have one.
This revealed that the event was correctly read from the netlink socket,
but then discarded here:
/* skip device, if it does not pass the current filter */
if (!passes_filter(udev_monitor, udev_device)) {
struct pollfd pfd[1];
int rc;
udev_device_unref(udev_device);
src/platform/udev_wrapper.cpp has wrappers for
udev_monitor_filter_add_match_subsystem_devtype(), and apparently for
this test this is applied:
src/platform/graphics/mesa/display.cpp:
monitor.filter_by_subsystem_and_type("drm", "drm_minor");
Mir does not use tag based filtering.
Further additions of udev_dbg() to passes_filter() reveals that the
received uevent device delivers udev_device_get_devtype() == NULL, which
is the reason for discarding it as the monitor filter only watches for
devtype "drm_minor".
My current suspicion is that this is a race on the fake /sys "uevent"
file for the device -- it gets read before it got completely written, or
rather synced to disk. Thus the "DEVTYPE=" property would be missing.
This would be the kind of thing which would get aggravated under high
system load. Also, I ran the test case with just this patch:
// sleeping between calls to fake_devices hides race conditions
- std::this_thread::sleep_for(std::chrono::microseconds{500});
+ //std::this_thread::sleep_for(std::chrono::microseconds{500});
+ sync();
... and it is now through 500 iterations without failure. This doesn't
prove that this is the bug as the sync() delays the iterations quite a
bit (each test now takes ~ 200 ms), but it's currently the only
plausible explanation that I have.
I'll create a similar test for that in umockdev's test suite, which will
make this slightly easier to debug and ensure it stays fixed.
** Also affects: umockdev (Ubuntu)
Importance: Undecided
Status: New
** Summary changed:
- Intermittent
mir_unit_tests.MesaDisplayTest.drm_device_change_event_triggers_handler test
failure
+ Intermittent
mir_unit_tests.MesaDisplayTest.drm_device_change_event_triggers_handler test
failure: device DEVTYPE is sometimes NULL
--
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to umockdev in Ubuntu.
https://bugs.launchpad.net/bugs/1336671
Title:
Intermittent
mir_unit_tests.MesaDisplayTest.drm_device_change_event_triggers_handler
test failure: device DEVTYPE is sometimes NULL
Status in Mir:
In Progress
Status in “umockdev” package in Ubuntu:
New
Bug description:
As seen in: http://s-jenkins.ubuntu-ci:8080/job/mir-clang-utopic-
amd64-build/799/console
To reproduce locally:
bzr branch lp:mir/devel mir-devel && cd mir-devel
mkdir build && cd build && cmake .. && make -j4
umockdev-wrapper bin/mir_unit_tests
--gtest_filter=MesaDisplayTest.drm_device_change_event_triggers_handler
--gtest_repeat=-1 --gtest_break_on_failure
(Ignore the segfault when the test fails, it's a side effect of
--gtest_break_on_failure)
Running with strace or on a system with high load increases the chances that
we hit the problem. For example, running make -j4 in another mir branch while
running the tests does the trick for mir.
To manage notifications about this bug go to:
https://bugs.launchpad.net/mir/+bug/1336671/+subscriptions
--
Mailing list: https://launchpad.net/~desktop-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~desktop-packages
More help : https://help.launchpad.net/ListHelp