On Thu, 22 Jan 2015 04:58 AM, Carsten Haitzler <ras...@rasterman.com> said: > could you send the patch needed for this change? i like the theoretical number improvement. > what i want to know is how invasive is it to the code.
--- a/configure.ac +++ b/configure.ac @@ -1666,6 +1666,12 @@ if test "x${have_pixman}" = "xyes" ; then EFL_ADD_FEATURE([EVAS_PIXMAN], [scale_sample], [${have_pixman_scale_sample}]) fi +### Check for OpenMP ### +## --disable-openmp option will be added to configure script incase openmp is detected +AC_OPENMP +EFL_ADD_CFLAGS([EVAS], [$OPENMP_CFLAGS]) +EFL_ADD_LIBS([EVAS], [$OPENMP_CFLAGS]) + ## Engines define([EVAS_ENGINE_DEP_CHECK_FB], [ diff --git a/src/lib/evas/common/evas_scale_sample.c b/src/lib/evas/common/evas_scale_sample.c index 91a75eb..f97b71b 100644 --- a/src/lib/evas/common/evas_scale_sample.c +++ b/src/lib/evas/common/evas_scale_sample.c @@ -587,14 +586,14 @@ scale_rgba_in_to_out_clip_sample_internal(RGBA_Image *src, RGBA_Image *dst, else #endif { - /* a scanline buffer */ - buf = alloca(dst_clip_w * sizeof(DATA32)); - /* image masking */ if (dc->clip.mask) { RGBA_Image *im = dc->clip.mask; + /* a scanline buffer */ + buf = alloca(dst_clip_w * sizeof(DATA32)); + for (y = 0; y < dst_clip_h; y++) { dst_ptr = buf; @@ -618,6 +617,28 @@ scale_rgba_in_to_out_clip_sample_internal(RGBA_Image *src, RGBA_Image *dst, } else { +#if _OPENMP + buf = malloc(dst_clip_w * dst_clip_h * sizeof(DATA32)); + if (!buf) return; + + #pragma omp parallel for private(dst_ptr, x, ptr) + for (y = 0; y < dst_clip_h; y++) + { + dst_ptr = buf + (dst_clip_w * y); + for (x = 0; x < dst_clip_w; x++) + { + ptr = row_ptr[y] + lin_ptr[x]; + *dst_ptr = *ptr; + dst_ptr++; + } + /* * blend here [clip_w *] buf -> dst_ptr * */ + func((dst_ptr - dst_clip_w), NULL, dc->mul.col, (dptr + (y * dst_w)), dst_clip_w); + } + if (buf) free(buf); +#else + /* a scanline buffer */ + buf = alloca(dst_clip_w * sizeof(DATA32)); + for (y = 0; y < dst_clip_h; y++) { dst_ptr = buf; @@ -633,6 +654,7 @@ scale_rgba_in_to_out_clip_sample_internal(RGBA_Image *src, RGBA_Image *dst, dptr += dst_w; } +#endif } } } On Wed, 21 January, 2015 07:19 PM, Cedric BAIL <cedric.b...@free.fr> said: > Could you provide before and after full result from expedite. We had in the past > some parallelization code and it ended up triggering slower path for most case. > It also resulted in consuming massively more CPU with little improvement for the > few tests case where we could see an improvement. Image Blend Nearest Scaled: 127.40 191.03 ( +49.9%) Above was the expedite-cmp results on I3. All the other cases reported change of < +/- 5 % Please note that I ran expedite with -c(count) of 10000. Otherwise small test-cases like "Textblock Intl" gives varying number. Above patch affected only "Image Blend Nearest Scaled" test-case which showed consistent improvement of ~48% > Basically, if by adding 1 core, you gain 30% and now both core are running at 100%, > we are not sure it is a good tradeoff. But if by having 3 more cores, you only get 48% speed > increase and all core are running at 100%, then we are sure it is not a wise choice from an energy perspective. You can expect speedup linear to 'no. of cores' when the load is considerably huge. But in case of evas/expedite we scale small/medium size image mostly with small clip areas. I believe this is the reason for less speedup. And yes, we need to definitely benchmark energy consumption in this case > We need extensive benchmark on speed, memory and battery consumption for this move. > One of the change we can investigate is to do that manually and tweak it to only start a meaningful > number of core depending on the task. In many case the main issue is memory bandwidth and we > can maybe have some gain first by doing light compression on some more data. Anyway, it's an area that > require a lot of experimentation and data to move forward. We are definitively interested by more information. Yes, we can tweak number of threads using openmp apis. The above patch (and some other experimental patch) behaves very differently on an arm soc compared to a desktop pc. We may have to selectively enable some optimizations based on arch Thanks, Krishnaraj ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel