On Mon, 19 Sep 2016 07:59:37 -0500 Derek Foreman <der...@osg.samsung.com> said:
> On 19/09/16 05:08 AM, Carsten Haitzler (The Rasterman) wrote:
> > On Mon, 19 Sep 2016 11:07:15 +0200 Stefan Schmidt <ste...@osg.samsung.com>
> > said:
> >> Hello.
> >> On 16/09/16 21:11, Derek Foreman wrote:
> >>> derekf pushed a commit to branch master.
> >>> http://git.enlightenment.org/core/efl.git/commit/?id=a17ac66f0a0b089dde0b2e550523b0d59ec97f52
> >>> commit a17ac66f0a0b089dde0b2e550523b0d59ec97f52
> >>> Author: Derek Foreman <der...@osg.samsung.com>
> >>> Date: Thu Sep 15 16:05:25 2016 -0500
> >>> render_thread: Attempt to set affinity to a random fast core
> >>> We've been pinning the render thread for every EFL process to core 0.
> >>> This is a bit silly in the first place, but some big.LITTLE arm
> >>> systems, such as exynos 5422, have the LITTLE cores first.
> >>> On those systems we put all the render threads on a slow core.
> >>> This attempts to fix that by using a random core from the pool of fast
> >>> cores.
> >>> If we can't determine which cores are fast (ie: we're not on a
> >>> linux kernel with cpufreq enabled) then we'll continue doing what
> >>> we've always done.
> >> I had to revert this patch as it broke all efl builds for me. Locally
> >> and on Jenkins. Edje_cc segfaulted on the in tree edc files. Error
> >> message is in the revert commit message.
> >> From the description here this change would still be needed but in a
> >> non breaking way. :)
> > how about simply removing the pinning (affinity) entirely?
> I thought about that... The render thread's still going to take a
> performance hit if it bounces from processor to processor much (it's
> probably pathologically bad for cache invalidation?)
> Also, if it bounces around on the "LITTLE" cores on a big.LITTLE system
> it'll have really bad performance characteristics...
> There's a group working on improving the scheduler to better handle non
> uniform multi-processor systems, so eventually we shouldn't need this
> anymore, but I think for now it's generally a win.
then this would need a lot more abstracting. like some special enums that ask
for a "fast core". also imagine 2 apps that pin tot he same core. the same
faster (big) core has to split rendering between 2 processes now. remember this
uses cpu_set_t under the covers ... so a special enum in eina_thread that maps
to -2, -3, -4, etc. that THEN uses cpu_set_t to set up the correct cpu set
which could include ALL big cores thus allowing it to run on any one of them.
your changes to eina_thread really didnt do this.
there's also some fun bits - some vendors use cpu hotplug for power
management... thus eina_cpu_count() often returns 1 on an 8 core system.
hooray. either way right now the best thing to do might be to disable affinity
until the above can be fixed 'right' like having specific parsing
of /proc/cpuinfo to figure out the real system we're on and how many cores
might be available, which ones are or are not "big" or "little" or otherwise
non-homogenous (evas' thread renderer started life when the 2 core athalons cam
out and well.. you had symmetric cores then... as you did when you have
multiple REAL cpu's before that and still do today). big.little upsets the
applecart with homogeneity. should rendering be forced on big always? what if
the rendering is really simple this frame? what if we're on battery and saving
power is more important than squeezing some xtra fps? :)
it's all hairy because tbh there is not enough metadata available to
userspace OR to the kernel. :( i've proposed this various times to kernel devs
"we need an interface to set scheduler/power management hints on threads like
"this is about to consume a lot of cpu for a very important user-facing task
like rendering, so having a high performance core spun up to full clock rate
RIGHT NOW would be really good!, but we're going to get in and get out asap and
won't sit and spin for 100's of ms here". ... so the kernel can make good
decisions as to where to schedule this thread, what to do with the clockrate,
memory bus clock, and more. we should call this at the start of rendering and
hope that by the time this syscall returns the kernel has decided what to do.
it may move us to a new core, or just clock up the current one, or do nothing,
buty at least it knows what we intend to do. we're not running this thread just
to do some async i/o or parsing an xml datastream in the background or
whatever. this task we want to have as low latency as possible until it's
done, so "pretty please with a cherry on top" do the best you can to make it
happen given overall system constraints.
but such things don't exist and every kernel dev poopoos the idea thinking
there is some magical way to make this happen when the kernel has no CLUE what
code is about to execute, or the only alternatives are nasty things like asking
to raise priority of this thread explicitly, or we'd just keep a thread pool
and just have a thread that when we wake up from handling events after idle,
we'd wake up and that thread will sit with for (;;) check_go(); in a tight poll
for a go flag spinning cpu as fast as it can to hopefully force the kernel to
clock up the core it's running on... or we need root access to /sys to mess
with cpufreq and ping this thread to a specific core and then force it to top
clockrate, and do our own cpu migrating ourselves in userspace co-operatively
by changing out affinity explicitly etc. etc. ...
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler) ras...@rasterman.com
enlightenment-devel mailing list