On Sat, Aug 29, 2009 at 1:04 AM, Christian Hergert<ch...@dronelabs.com> wrote: > >> On Fri, Aug 28, 2009 at 11:49 PM, Christian Hergert<ch...@dronelabs.com> >> wrote: >>> >>> Hi, >>> >>> What you mentioned is good information to start hunting. Was the CPU >>> time >>> related to IO wait at all? Always get accurate numbers before >>> performance >>> tuning. "Measure, measure, measure" or so the mantra goes. >> >> Perhaps a stupid question but what is a good way of profiling io? cpu >> is easy but i've never done io. >> In this case my hdd is certainly able to do more then 10 thumbnails >> per second however i could see a potential issue when someone with a >> slower hdd and a faster cpu then mine is thumbnailing a lot of images. >> There the hdd will likely be the bottleneck. > > You can do something really crude by reading from /proc/pid/* (man proc for > more info). Or you could try using some tools like sysstat, oprofile, > system-tap, etc. We really need a generic profiling tool that can do all of > this stuff from a single interface. However, at the current time, I've been > most successful with just writing one off graphing for the specific problem. > For example, put in some g_print() lines and grep for those and then graph > them using your favorite plotter or cairo goodness. > >>> Unfortunately, the symptom you see regarding IO will very likely change >>> under a different processing model. If the problem is truly CPU bound >>> then >>> you will only be starting IO requests after you were done processing. >>> This >>> means valuable time is wasted while waiting for the pages to be loaded >>> into >>> the buffers. The code will just be blocking while this is going on. >> >> And how can i test that? > > ltrace works for simple non-threaded applications. Basically you should see > in the profiling timings that one work item happens sequentially after the > previous such as (load, process, load, process, ...) > > I would hate to provide conjecture about the proper design until we have > more measurements. It is a good idea to optimize the single threaded > approach before the multi-core approach since it would have to be done > anyway and is likely less complex of a problem before the additional > threads. > >>> What could be done easily is every time an item starts processing it >>> could >>> asynchronously begin loading the next image using gio. This means the >>> kernel can start paging that file into the vfs cache while you are >>> processing the image. This of course would still mean you are limited to >>> a >>> single processor doing the scaling. But if the problem is in fact cpu >>> bound, that next image will almost always be loaded by time you finish >>> the >>> scale meaning you've maximized the processing potential per core. >> >> That sounds like a nice way to optimize it for one core. But could >> there be any optimization possible in my case? since i have 100% cpu >> usage for one core with just the benchmark. > > You can't properly optimize for the multi-core scenario until the > single-core scenario is fixed. > >>> To support multi-core, like it sounds like you want, a queue could be >>> used >>> to store the upcoming work items. A worker per core, for example, can >>> get >>> their next file from that queue. FWIW, I wrote a library, iris[1], built >>> specifically for doing work like this while efficiently using threads >>> with >>> minimum lock-contention. It would allow for scaling up threads to the >>> number of cores and back down when they are no longer needed. >>> >> That sounds very interesting. >> Just one question about the queue. Would it be better to thread the >> application (nautilus) or the library (glib)? If your answer is the >> library then the queue has to be passed from nautilus to glib. I would >> say glib because all application have benefit from it without >> adjusting there code. > > I haven't looked at this code in detail yet, so I cannot confirm or deny. > My initial assumption would be that the thumb-nailing API (again, I have no > experience with it yet) should be restructured around an asynchronous design > (begin/end methods) and the synchronous implementation built around that. > And of course, nobody should use the synchronous version unless they > *really* have a reason to. > > FWIW, I would be willing to help hack on this, but I'm swamped for at least > the next few weeks. > > -- Christian >
I guess the next thing for me would be to get more accurate benchmarks. I right now have the benchmarks in timings (so, how long does making a pixbuf from an image take, how long to do the scaling (surprisingly short!) and how long to save it) but i guess i need to expand that a bit with io timings as well. I will just give it a try. _______________________________________________ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list