On 09/11/2014 04:58 PM, Jason Ekstrand wrote:


On Thu, Sep 11, 2014 at 3:53 PM, Dieter Nützel <die...@nuetzel-hh.de
<mailto:die...@nuetzel-hh.de>> wrote:

    Am 12.09.2014 00:31, schrieb Jason Ekstrand:

        On Thu, Sep 11, 2014 at 2:55 PM, Dieter Nützel
        <die...@nuetzel-hh.de <mailto:die...@nuetzel-hh.de>>
        wrote:

            Am 15.08.2014 04:50, schrieb Jason Ekstrand:

                On Aug 14, 2014 7:13 PM, "Dieter Nützel"
                <die...@nuetzel-hh.de <mailto:die...@nuetzel-hh.de>>
                wrote:


                    Am 15.08.2014 02:36, schrieb Dave Airlie:

                                On 08/02/2014 02:11 PM, Jason Ekstrand
                                wrote:



                                    Most format conversion operations
                                    required by GL can be

                performed by

                                    converting one channel at a time,
                                    shuffling the channels

                around, and

                                    optionally filling missing channels
                                    with zeros and ones.

                This
                adds a

                                    function to do just that in a
                                    general, yet efficient, way.

                                    v2:
                                    * Add better comments including full
                                    docs for functions
                                    * Don't use __typeof__
                                    * Use inline helpers instead of
                                    writing out conversions

                by
                hand,

                                    * Force full loop unrolling for
                                    better performance



                        This file seems to anger gcc a lot.

                        It seems to take upwards of a minute or two to
                        compile here.

                        gcc 4.8.3 on 32-bit x86.

                        Dave.



                    For me (on our poor little Duron 1800/2 GB) it ran ~5

                minutes...


                    gcc 4.8.1 on 32-bit x86.


                If we'd like, the way the macros are set up, it would be
                easy to
                change it so that we do less unrolling in the cases
                where we are
                actually doing substantial format conversion and
                wouldn't notice
                the
                extra logic quite as much. I'll play with it a bit
                tomorrow or
                next
                week and see how how much of a hit we would actually
                take if we
                unrolled a little less in places.
                --Jason Ekstrand


            Ping.

            In a second it took 11+ minutes , here...


        11 minutes! What system are you running?  and are you using -03 or
        something?  Yes, we can do something to cut it down, but it will
        probably require a configure flag; the question is what flag.

        --Jason


    See above, the old children's system... ;-)
    -O2 -m32 -march=athlon-mp -mtune=athlon-mp -m3dnow -msse -mmmx
    -mfpmath=sse,387 -pipe

    Bad? - Worked for ages on AthlonMP....8-)
    Maybe it is bad on Duron (the MP thing, much smaller cache and
    better GCC), now.

    Dieter


Yeah, my recommendation would be hacking the macros to not unroll and
keep the patch locally.  If you've got a better idea as to how to
organize the code so the compiler likes it, I'm open as long as we don't
loose performance.

It looks like a release build with MSVC is taking quite a while to compile this file too (actually at link time when the optimizer kicks in).

But even on my fast Linux system with gcc, the difference in compile time between -O0 and -O3 is pretty big (2 seconds vs. 1 minute, 3 seconds).

I'm still prototyping something but it looks like breaking the top-level switch cases in _mesa_swizzle_and_convert() into separate functions reduces the time quite a bit. Let me pursue that a bit further and see how it goes...

-Brian

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to