[gentoo-amd64] Re: How well does your dual-head window manager handle games?

Duncan Tue, 02 Feb 2010 04:04:50 -0800

Mark Knecht posted on Mon, 01 Feb 2010 06:42:24 -0800 as excerpted:

> On the new machine I've set it up as dual-head which is working nicely
> for all the basic stuff, and in general pretty nicely for running
> VMWare/WinXP on the second screen for most of the day. I have a few
> issues, like the mouse can become __very__ laggy in VMWare at times
> but other than that all the basics are there and working well enough
> to get some work done. I'm using XFCE4 at the moment.


I wouldn't do vmware as it's servantware, and thus don't know a /whole/ 
lot about it, but here's a bit of general wisdom on lagginess/latency 
issues.  Was it you that did a bunch of sound related stuff?  If so, you 
likely know (and have it set as appropriate) some of this already.

First, what's your kernel tick time set for, 100, 250, 300, 1000 ticks per 
second?  Obviously higher ticks will help with latency, but it negatively 
affects thruput.  Also note that with SMP (multiple CPUs/cores), each one 
ticks at that, so you can often turn down the ticks a notch or two from 
what you'd normally have to run, if you're running SMP.

Second, what's your kernel preemption choice?  No-preemption/server, 
voluntary-preemption/desktop, or full-preemption/low-latency-desktop?  
Again, there's a trade-off between latency and thruput.  If you're worried 
about mouse lagginess, server isn't appropriate, but you can choose from 
the other two.

Third, there's additional low-latency kernel patches available...  I'll 
leave that alone as I run vanilla kernel.

Fourth, there's I/O scheduling.  Due to the way I/O works, often, the 
kernel stops doing much of whatever else it was doing when it's handling
I/O.  What I/O scheduler are you running, and have you noted the disk 
activity LEDs blinking furiously (or conversely, no disk activity at all) 
during your latency?  How's your memory situation?  How far into swap do 
you typically run?  Do you run /tmp and/or /var/tmp on tmpfs?  
Particularly when you're emerging stuff in the background, having 
PORTAGE_TMPDIR pointed at a tmpfs can make a pretty big difference, both 
in emerge speed, and in system responsiveness, because there's much less
I/O that way.  That's assuming, of course, that you have at least a couple 
gigs of memory and aren't already starved for memory with your typical 
application load.

Fifth, priority.  Have you tried either higher priority for the vmware 
stuff or lower priority for other things, portage, anything else that may 
be hogging CPU?  (For portage, I like to set PORTAGE_NICENESS=19, which 
automatically sets scheduler batch mode for it as well.  The priority is 
as low as possible so it doesn't interfere with other things to the extent 
possible, while the batch mode means it gets longer timeslices, too, thus 
making it more efficient with what it does get.)

The above, save for priority, is mostly kernel related, so should have an 
effect regardless of whether your vmware vm is mostly kernel or userland 
implementation.  The below is mostly for userland so won't work as well if 
vmware is mostly kernel.  I don't know.

Sixth, are you using user-group or control-group (aka cgroup) kernel 
scheduling, or not, and how do you have it configured?  The kernel options 
are under general setup.  Cgroup scheduling gets rather complicated, but 
user-group scheduling is reasonably easy to configure, and it can make a 
**BIG** difference on a highly loaded system.  Thus, I'd suggest user-
group scheduling.

To enable user-group scheduling, enable Group CPU scheduler, and 
(normally) Group scheduling for SCHED_OTHER, which is everything OTHER 
than real-time threads.  I leave the scheduling for SCHED_RR/FIFO off, as 
unless you know what you are doing and have specific reason to mess with 
real-time scheduling, it's best NOT to mess with it, because it's a VERY 
easy way to seriously screw your system!

Again, you probably do NOT want to mess with control group support, unless 
you have specific needs beyond what user-group scheduling will do for you, 
because that gets quite complicated.  Therefore, leave that option off, 
and under Basis for grouping tasks, make sure it says "(user id)".  
That'll be the only option unless you have control group support enabled.

Now, how do you use it?  Simple.  For each user currently running at least 
one application, there's a /sys dir with the user id number (not name, 
number, you need to know the number), /sys/kernel/uids/<uid>.  In this 
directory, there's a file, cpu_share.

The contents of this file is the relative CPU share the user will get, 
compared to other users, when the system is under load and thus has to 
ration CPU time.  The default share for all users save for root is 1024.  
Root's default share is double that, 2048.

So here's how it works.  With user-group scheduling enabled, instead of 
priority alone determining scheduling, now priority and user determine 
scheduling.  Once the system is under load so it matters, no user can take 
more than their share, regardless of what priority their apps are running 
at.  If you want a particular user to get more time, double its share.  If 
you want to restrict a user, half its share.  Just keep in mind that root 
has a 2048 share by default, so it's wise to be a bit cautious about 
increasing too many users up to that or beyond unless you boost root as 
well, just to be sure.  Various system housekeeping threads, kernel 
threads, etc, use time from the root share, so you want to be a bit 
careful about increasing other users above it, or the housekeeping 
threads, disk syncs, etc, might not have the time to run that they need.  
However, increasing just one single user to say 4096 shouldn't starve root 
too badly even if that user gets a runaway app, as root will still be 
getting half that time, as long as everything else remains at 1024 or 
below.  But obviously, you won't want to put say the portage user at 4096!

I routinely bump my normal user to 2048 along with root, when I'm running 
emerges, etc.  This is with FEATURES="userfetch userpriv usersync" among 
others, so portage is spending most of its time as the portage user, thus 
with its default 1024 share.  Boosting my normal user to 2048 thus ensures 
that it (along with root) gets twice the time that the portage user does, 
but even should one of my normal user apps go into runaway, root still 
gets nearly half the CPU (more precisely, just under 40%, since root and 
my normal user would each be getting double the portage user, with other 
users not taking much as they'd not be in runaway, so root and the normal 
user would get nearly 40% each, while portage would get nearly 20%, with 
perhaps the other non-runaway users taking a percent or two, thus the 
"nearly") if it needs it, which should be plenty to login as root and kill 
something, if I have to, or to shut down the system in an orderly way, or 
do whatever else I'd need to do.

Even if I were to run my normal user at 4096 and it would have a runaway, 
it would get 4 shares, portage would get one, and root would get two, so 
even then, root would get nearly 2/7 or about 28% share, with the runaway 
user getting double that or about 56% and portage getting about 14%.  Even 
28% share for root should be enough, so that's reasonably safe.  However, 
I'd be extremely cautious about going over 4096, or increasing a second 
user's share to that too, unless I increased root's share as well.

That's actually simplifying it some, tho, as the above assumes all the CPU 
hogs are running at normal 0 priority/niceness.  But as I mentioned, I 
have PORTAGE_NICENESS=19, so it's running at idle priority, which would 
lower its claim to the portage user share dramatically.  Basically, at 
idle priority, it'd get very little share if there was another run away 
(normal priority) process, as ANY user.  (The scheduler /does/ normally 
give /every/ process at least one timeslice per scheduling period, even at 
idle priority, to prevent priority inversion situations in case of lock 
contention and the like.)  So the above percentage scenarios would be more 
like 48/48/1/3 (root/user/portage/other) in the 2048/2048/1024s case, and 
32/65/1/2 in the 2048/4096/1024s case.  Basically, the portage user, even 
tho it's using all the CPU it can get, would still fall into the noise 
range along with other users, because it's running at idle priority, and 
root would thus get close to half or close to a third of the CPU, with the 
normal user at equal share or double share of root, respectively.

I've had very good results using that setup.  Just for curiosity' sake, I 
tried running ridiculous numbers of make jobs, to see how the system 
handled it.  With this setup (PORTAGE_NICENESS=19, portage user at 1024 
share, root at 2048, and normal user at either 1024 or 2048), I can 
increase make jobs without limit and still keep a reasonably usable 
system, as long as the memory stays under control.  This is MUCH more so 
than simply running PORTAGE_NICENESS=19 but without per-user scheduling 
enabled.  In practice, therefore, the limit on make jobs is no longer CPU 
scheduling, but the amount of memory each job uses.  I set my number of 
make jobs so that I don't go into swap much, if at all, even with 
PORTAGE_TMPDIR pointed at tmpfs.  Because swapping is I/O, and I/O, due to 
the way the hardware works, increases latency, sometimes unacceptably.

Actually, my biggest thread test has been compiling the kernel, since it's 
so easily parallellizable, to a load average of several hundred if you let 
it, without using gigs and gigs of memory (yes it takes some, but nowhere 
near what a typical compile would take at that number of parallel jobs) to 
do it.  I do my kernel compiles as yet another user (what I call my 
"admin" user), so like portage, it gets the default 1024 share.  But I 
don't set niceness so it's running at normal 0 priority and taking its 
full share against other 0 priority users.  Compiling the kernel, I can 
easily run over a hundred parallel make jobs without seriously stressing 
the system.  But even there, with user scheduling enabled and at normal 
priority so the kernel compile is taking all the share it can, the memory 
requirements are the bottleneck, not the actual jobs or load average, 
because the kernel per-user scheduling is doing its job, giving my other 
non-hog users and root the share they need to continue running normally.

So assuming vmware is running in userspace and thus affected by priority 
and user-groups, I'd definitely recommend setting up user-groups and 
fiddling with its share, as well as that of the rest of the system, along 
with the other steps above.

The one caveat with user-group scheduling, is that the
/sys/kernel/uids/<uid> directories are created and destroyed dynamically, 
as apps run and terminate as those users.  It's thus not (easily) possible 
to set a static policy, whereby a particular UID /always/ gets a specific 
non-default share.  There was a writeup I read back when the feature was 
first introduced, that was supposed to explain how to set up an automatic 
handler such that every time a particular UID appeared, it'd write a 
particular value to its cpu_share file, but as best I could tell, the 
writeup was already out of date, as the scripts that were supposed to be 
called automatically according to the writeup, were never called.  The 
kernel hotplugging (and/or udev, or whatever was handling it) had changed 
out from under the documentation, even as the feature was going thru the 
process of getting the peer approval necessary to be added to the mainline 
kernel.  So I've never had an automatic policy setup to do it.  It could 
certainly be done using a file-watch (fnotify/dnotify, etc, or polling of 
some sort), without relying on the hotplugging mechanism that was supposed 
to work that I could never get to work, but I've not bothered.  I've 
simply created scripts to echo the desired numbers into the desired files 
when I invoke them, and run them manually when I need to.  That has worked 
well enough for my needs.

(Now you see why I'm not going into cgroups?  user-group scheduling is 
actually quite simple.  Imagine the length of the post if I was trying to 
explain cgroups!)

> The one place where I've been a bit disappointed is when the VGA
> drivers need to switch resolutions to play a game like Tux Racer then
> instead of two desktops I'm seeing one desktop duplicated on both
> monitors. Is this normal or is there some general way to control this?
> I'd really like the game on one monitor and just have the other stay
> black.

The problem here is that most resolution switchers simply assume a single 
monitor.  Before the X RandR extension, there was really no standard way 
to reliably handle multiple monitor setups (xinerama, merged-framebuffer, 
and proprietary or semi-proprietary methods like that used by the nvidia 
and frglx drivers, were all in use at various times by various hardware/
drivers, and for all I know there were others as well), so assuming a 
single monitor was pretty much the best they could do.

RandR has solved the standardization problem, but few games have upgraded 
to it, in part because it's apparently "rocket science" to properly 
program it.  The xrandr CLI client works, but all too often, the X 
environment tools are simply broken.  KDE for example has had a tool 
that's supposed to handle multiple monitors, changing resolutions, etc, 
for some time, but on all three sets of hardware and drivers I've tried it 
on, both the kde3 version and the kde4 version thru 4.3.4, it has screwed 
things up royally if you're using more than one monitor.  Only xorg's own 
xrandr gets it right, and that's a CLI client, with a slew of options to 
read about and try to master, before you can properly run it.  I've 
scripted a solution here using it, hard-coding some of the options I don't 
change into the script (could be a config file) thus making my script 
simple enough to run from the command line (or invoke from a menu entry) 
without having to remember all the complicated syntax, but that's not 
going to work for the CLI-phobic.  And if the X environments can't get it 
working correctly for many users even with the documentation and the 
xrandr code to follow, what are the games folks supposed to do?  So they 
simply continue to assume only a single monitor... and screw things up for 
those of us with more than one, at least if we prefer to run them in other 
than clone mode.

Because that's what's happening.  When the games, etc, trigger the old 
single-monitor resolution change API, it causes xorg to switch to clone 
mode, running all monitors at the same resolution, showing the same thing.

FWIW, the solution I've found, as I mentioned, is a script setup to invoke 
my preferred resolutions, in my preferred non-clone modes, retaining my 
preferred "stacked" monitor orientation, by invoking xrandr with the 
appropriate parameters to do so.

Thus I use my script (which uses xrandr) to set the resolution I want, and 
set the game not to change resolution -- to run in a window or whatever, 
instead.  I run kde, and with kwin's per-app and per-window config 
options, I set it up to always put the windows for specific games at 
specific locations, sometimes without window borders etc.  Between that 
and triggering the resolution settings I want with my xrandr script, I can 
get the game running in a window, but that window set to exactly the size 
and at exactly the location of the monitor I want it to run on, while the 
other monitor continues at its configured size and showing the desktop or 
apps it normally shows.

"Works for me!" =:^)

>    More disturbing is when I exit the game I'm left with both desktops
> displaying the same things and neither is exactly my original first or
> second desktop but rather a combination of the two which is fairly
> strange. (Desktop #1 icons with Desktop #2 wallpaper)

Both desktops (monitors, I assume, that's quite different from virtual 
desktops, which is how I'd normally use the term "desktop") displaying the 
same thing is simply clone mode.  You can try using your X environment 
resolution tool (if xfce has such a thing, kde does and I think gnome 
does) to switch back to what you want, but as I said, don't be surprised 
if it doesn't work as expected, because they've really had problems 
getting the things working right.  xrandr gets it right, and you'd /think/ 
they could if /nothing/ else read its code and use similar tricks, it /is/ 
open source, after all, but kde certainly hasn't gotten it right, at least 
not for many drivers and hardware, and from what I've read, gnome's 
version isn't a lot better.

But, if you're up for a bit of reading, you can figure out how xrandr 
works well enough to get it to do what you want.  Here's an example, 
actually the debug output of the script I run, showing the xrandr command 
as it's setup and invoked by the script (all one command line):

xrandr --verbose --fb 1920x2400 --output DVI-0 --mode 1920x1200 --panning 
1920x1200+0+0/1920x1200+0+0/20/20/20/20 --output DVI-1 --mode 1920x1200 --
panning 1920x1200+0+1200/1920x1200+0+1200/20/20/20/20

That results in:

1) an overall framebuffer resolution of 1920x2400

2) output DVI-0 being set to resolution 1920x1200, with its top-left 
corner at position 0,0.

3) output DVI-1 being set to a similar resolution (I have two of the same 
model of monitor, 1920x1200 native resolution), but with its top-left 
corner at position 0,1200, thus, directly under DVI-0.

The panning mode stuff (except for the positioning bit) wouldn't be 
necessary here as there's no panning to do, but those are the script 
defaults.  For use of panning mode, see this one:

xrandr --verbose --fb 1920x2400 --output DVI-0 --mode 1280x800 --panning 
1920x1200+0+0/1920x1200+0+0/20/20/20/20 --output DVI-1 --mode 1280x800 --
panning 1920x1200+0+1200/1920x1200+0+1200/20/20/20/20

This keeps the same overall framebuffer size and output orientation 
(stacked), but the outputs are both run at 1280x800, with the panning 
domain set for each one such that as the mouse gets to 20 px from any edge 
of the 1280x800 viewport, it moves the viewport within the corresponding 
1920x1200 panning domain.

Here's one with different resolutions, and with panning when the mouse 
reaches the edge itself (instead of 20 px in) on the lower resolution and 
position one (DVI-1), so I can run a game there, without it trying to pan 
out as near the edge.  I then put the viewport over the game and let the 
game grab the mouse, so I can then play the game without having to worry 
about panning.  If I need to, I can have it "ungrab" the mouse, and have 
panning again on the lower one, or move to the "fixed" upper one and do 
stuff there.

xrandr --verbose --fb 1920x2400 --output DVI-0 --mode 1920x1200 --panning 
1920x1200+0+0/1920x1200+0+0/20/20/20/20 --output DVI-1 --mode 960x600
--panning 1920x1200+0+1200/1920x1200+0+1200/

When I'm finished with the game, or if I want to run normal resolution and 
do something else for a bit, I just run that first command again, and it 
returns me to normal mode as set by that first command.

Unfortunately, kde4 still has a few bugs with multiple monitors, 
especially when switching resolutions.  As mentioned, the kde4 resolution 
switcher itself is entirely screwed up as all it can handle is clone mode 
(there's no way to set separate non-identical top-left corners for each 
monitor), but there's bugs with the plasma desktop as well.  If I do 
happen to select clone mode, or disable one of the monitors using xrandr, 
upon return to normal mode, plasma-desktop is screwed up.  I can fix it 
without restarting X/kde, but it's a hassle to do so, and somewhat trial 
and error, zooming in and out the various plasma "activities", until I get 
it setup correctly once again.  Hopefully, 4.4 has improved that as well.  
I read it has.  We'll see...

>    I'm wondering if other environments handle this better. XFCE is
> pretty lightweight, which I like. I'd gone away from Gnome because of
> the time spent maintaining it on Gentoo but on this machine it probably
> wouldn't be all the bad. Not sure I want KDE but I'm curious as to
> whether anything solves this problem?

Well... kde 3 worked reasonably well in this regard (except its resolution 
switcher wasn't much good either, I used X's ctrl-alt-numplus/numminus 
zooming while it worked, then developed the xrandr scripts I still use 
today when x switched to randr based switching and the numplus/numminus 
zooming didn't work any more, but the desktop at least stayed put), but as 
you can tell, I'm rather frustrated with kde4.

But definitely try xrandr.  It's a pain to learn as it's all CLI options 
not point and click, but it's remarkably good at doing what it does, once 
you know how to run it, and possibly hack up a script or several to take 
the complexity out of it.

>    Logging out of XCFE and then running startx gets everything back
> the way I want, and I don't think I'll play Linux games much, but I'm
> curious as to how well other environments handle this.

As explained, the base problem is that games assume single monitor, which 
X construes as a command to go into clone mode.  The solution is to use an 
external app (such as the xrandr invoking scripts I use) to set the 
resolutions you want, and don't invoke the games' options to change 
resolution or whatever, just have them run in a window.  Then match the 
window size to your desired resolution (enforcing it using your window 
manager, if that's more convenient or necessary), and invoke the script 
(or other external to the game resolution switcher app) changing the 
resolution right before you run the game.

Alternatively, since we're talking about a script already, you could set 
it up so the script runs xrandr to change the resolution as desired, then 
runs the game, then when the game is done, changes the resolution back.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

[gentoo-amd64] Re: How well does your dual-head window manager handle games?

Reply via email to