On Mon, 19 Sep 2016 07:59:37 -0500 Derek Foreman <der...@osg.samsung.com> said:
> On 19/09/16 05:08 AM, Carsten Haitzler (The Rasterman) wrote: > > On Mon, 19 Sep 2016 11:07:15 +0200 Stefan Schmidt <ste...@osg.samsung.com> > > said: > > > >> Hello. > >> > >> On 16/09/16 21:11, Derek Foreman wrote: > >>> derekf pushed a commit to branch master. > >>> > >>> http://git.enlightenment.org/core/efl.git/commit/?id=a17ac66f0a0b089dde0b2e550523b0d59ec97f52 > >>> > >>> commit a17ac66f0a0b089dde0b2e550523b0d59ec97f52 > >>> Author: Derek Foreman <der...@osg.samsung.com> > >>> Date: Thu Sep 15 16:05:25 2016 -0500 > >>> > >>> render_thread: Attempt to set affinity to a random fast core > >>> > >>> We've been pinning the render thread for every EFL process to core 0. > >>> This is a bit silly in the first place, but some big.LITTLE arm > >>> systems, such as exynos 5422, have the LITTLE cores first. > >>> > >>> On those systems we put all the render threads on a slow core. > >>> > >>> This attempts to fix that by using a random core from the pool of fast > >>> cores. > >>> > >>> If we can't determine which cores are fast (ie: we're not on a > >>> linux kernel with cpufreq enabled) then we'll continue doing what > >>> we've always done. > >> > >> I had to revert this patch as it broke all efl builds for me. Locally > >> and on Jenkins. Edje_cc segfaulted on the in tree edc files. Error > >> message is in the revert commit message. > >> > >> From the description here this change would still be needed but in a > >> non breaking way. :) > > > > how about simply removing the pinning (affinity) entirely? > > > I thought about that... The render thread's still going to take a > performance hit if it bounces from processor to processor much (it's > probably pathologically bad for cache invalidation?) > > Also, if it bounces around on the "LITTLE" cores on a big.LITTLE system > it'll have really bad performance characteristics... > > There's a group working on improving the scheduler to better handle non > uniform multi-processor systems, so eventually we shouldn't need this > anymore, but I think for now it's generally a win. then this would need a lot more abstracting. like some special enums that ask for a "fast core". also imagine 2 apps that pin tot he same core. the same faster (big) core has to split rendering between 2 processes now. remember this uses cpu_set_t under the covers ... so a special enum in eina_thread that maps to -2, -3, -4, etc. that THEN uses cpu_set_t to set up the correct cpu set which could include ALL big cores thus allowing it to run on any one of them. your changes to eina_thread really didnt do this. there's also some fun bits - some vendors use cpu hotplug for power management... thus eina_cpu_count() often returns 1 on an 8 core system. hooray. either way right now the best thing to do might be to disable affinity until the above can be fixed 'right' like having specific parsing of /proc/cpuinfo to figure out the real system we're on and how many cores might be available, which ones are or are not "big" or "little" or otherwise non-homogenous (evas' thread renderer started life when the 2 core athalons cam out and well.. you had symmetric cores then... as you did when you have multiple REAL cpu's before that and still do today). big.little upsets the applecart with homogeneity. should rendering be forced on big always? what if the rendering is really simple this frame? what if we're on battery and saving power is more important than squeezing some xtra fps? :) it's all hairy because tbh there is not enough metadata available to userspace OR to the kernel. :( i've proposed this various times to kernel devs "we need an interface to set scheduler/power management hints on threads like "this is about to consume a lot of cpu for a very important user-facing task like rendering, so having a high performance core spun up to full clock rate RIGHT NOW would be really good!, but we're going to get in and get out asap and won't sit and spin for 100's of ms here". ... so the kernel can make good decisions as to where to schedule this thread, what to do with the clockrate, memory bus clock, and more. we should call this at the start of rendering and hope that by the time this syscall returns the kernel has decided what to do. it may move us to a new core, or just clock up the current one, or do nothing, buty at least it knows what we intend to do. we're not running this thread just to do some async i/o or parsing an xml datastream in the background or whatever. this task we want to have as low latency as possible until it's done, so "pretty please with a cherry on top" do the best you can to make it happen given overall system constraints. but such things don't exist and every kernel dev poopoos the idea thinking there is some magical way to make this happen when the kernel has no CLUE what code is about to execute, or the only alternatives are nasty things like asking to raise priority of this thread explicitly, or we'd just keep a thread pool and just have a thread that when we wake up from handling events after idle, we'd wake up and that thread will sit with for (;;) check_go(); in a tight poll for a go flag spinning cpu as fast as it can to hopefully force the kernel to clock up the core it's running on... or we need root access to /sys to mess with cpufreq and ping this thread to a specific core and then force it to top clockrate, and do our own cpu migrating ourselves in userspace co-operatively by changing out affinity explicitly etc. etc. ... -- ------------- Codito, ergo sum - "I code, therefore I am" -------------- The Rasterman (Carsten Haitzler) ras...@rasterman.com ------------------------------------------------------------------------------ _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel