Re: GCC -msse2 portability question

Loic Dachary Wed, 26 Mar 2014 15:13:25 -0700


On 26/03/2014 19:44, Milosz Tanski wrote:
> On Wed, Mar 26, 2014 at 3:14 AM, Loic Dachary <[email protected]> wrote:
>> Hi Kevin & Milosz,
>>
>> So it would be
>>
>> if(sse4 & sse3) => use a plugin compiled with sse + sse3 + sse4 activated
>> else if(sse3) => use a plugin with sse2 + sse3 activated but not sse4
>> else => fallback to not using sse at all
> 
> Out of curiosity does else (generic) fallback to sse2 on x86_64? Since
> sse2 is the guarenteed baseline on x86_64 and I'm guessing that most
> ceph servers are x86_64.


It does not activate any -msse flags which is conservative until "erasure-code: 
fine grain SSE support" http://tracker.ceph.com/issues/7865 . I'm assuming most 
intel processor running ceph will have SSE3 and only a few will not, based on 
what 
http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options
 shows. But it's a gut feeling. Do you think this is a mistake ?

This is how it looks at the moment:

https://github.com/dachary/ceph/commit/e7875af10bf92c557b1ef97ffcd871dfe617c160

Cheers

> 
>>
>> like so:
>>
>> https://github.com/dachary/ceph/commit/b6e4307bd2ee1de6e8bbda0ced370d484d512114#diff-5249f49580782dfe95a1cbcc986ee5deR113
>>
>> If I understand Laurent correctly, the right approach would be to 
>> semi-transparently generate and select the code path depending on the 
>> features at runtime. But that would require more work and I created a ticket 
>> to track this : http://tracker.ceph.com/issues/7865
>>
>> Does that sound right ?
>>
>> On 25/03/2014 22:31, Kevin Greenan wrote:
>>> Hey Loic,
>>>
>>> I think we want something closer to what Milosz is proposing (3 cut-offs 
>>> instead of 2) .  The shuffle instruction is part of SSSE3 and is the basis 
>>> for the SSE split table techniques, which are super fast.  By doing 
>>> all-or-nothing, it is possible many users would not be able to take 
>>> advantage of it when they are capable.
>>>
>>> Make sense?
>>>
>>> -kevin
>>>
>>>
>>> On Tue, Mar 25, 2014 at 12:46 PM, Milosz Tanski <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>
>>>     It gets a bit more tricky with x86_64 since the arch dictates that the
>>>     base line has SSE2 (but not necessarily later).
>>>
>>>     I would do is both support SSE2 (maybe in core without dlopen) and
>>>     then support all the others in a SSE4 version (including SSE4_PCMUL).
>>>     I'm glossing over x86-32 here, but you could something similar.
>>>
>>>     Best
>>>     - Milosz
>>>
>>>     On Tue, Mar 25, 2014 at 3:21 PM, Loic Dachary <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>     >
>>>     >
>>>     > On 25/03/2014 20:13, Kevin Greenan wrote:
>>>     >> +1
>>>     >>
>>>     >> Yeah, that sounds better...  Let's keep this as simple as possible.
>>>     >
>>>     > I'll rework the 
>>> https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse
>>>  accordingly.
>>>     >
>>>     > Would it be sensible to compile with SSE optimizations only if all 
>>> are available ( SSE2, SSSE3, SSE4, SSE4_PCMUL ) and not attempt to 
>>> distinguish betweel SSSE3 being available but not SSE4_PCMUL etc. From what 
>>> I understand at this point that kind of distinction is going to be 
>>> difficult to manage anyway.
>>>     >
>>>     > Is it too simplistic ?
>>>     >
>>>     >>
>>>     >> -kevin
>>>     >>
>>>     >>
>>>     >> On Tue, Mar 25, 2014 at 12:08 PM, Loic Dachary <[email protected] 
>>> <mailto:[email protected]> <mailto:[email protected] 
>>> <mailto:[email protected]>>> wrote:
>>>     >>
>>>     >>     Andreas Peters suggested another approach, which makes sense to 
>>> me : have one plugin with SSE optimizations enabled, another without them 
>>> and chose at runtime between the two.
>>>     >>
>>>     >>     What do you think ?
>>>     >>
>>>     >>     On 23/03/2014 20:50, Loic Dachary wrote:
>>>     >>     > Hi Laurent,
>>>     >>     >
>>>     >>     > In the context of optimizing erasure code functions 
>>> implemented by Kevin Greenan (cc'ed) and James Plank at 
>>> https://bitbucket.org/jimplank/gf-complete/ we ran accross a question you 
>>> may have the answer to: can gcc -msse2 (or -msse* for that matter ) have a 
>>> negative impact on the portability of the compiled binary code ?
>>>     >>     >
>>>     >>     > In other words, if a code is compiled without -msse* and runs 
>>> fine on all intel processors it targets, could it be that adding -msse* to 
>>> the compilation of the same source code generate a binary that would fail 
>>> on some processors ? This is assuming no sse specific functions were used 
>>> in the source code.
>>>     >>     >
>>>     >>     > In gf-complete, all sse specific instructions are carefully 
>>> protected to not be run on a CPU that does not support them. The runtime 
>>> detection is done by checking CPU id bits ( see 
>>> https://bitbucket.org/jimplank/gf-complete/pull-request/7/probe-intel-sse-features-at-runtime/diff#Lsrc/gf_intel.cT28
>>>  )
>>>     >>     >
>>>     >>     > The corresponding thread is at:
>>>     >>     >
>>>     >>     > 
>>> https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse/diff#comment-1479296
>>>     >>     >
>>>     >>     > Cheers
>>>     >>     >
>>>     >>
>>>     >>     --
>>>     >>     Loïc Dachary, Artisan Logiciel Libre
>>>     >>
>>>     >>
>>>     >
>>>     > --
>>>     > Loïc Dachary, Artisan Logiciel Libre
>>>     >
>>>
>>>
>>>
>>>     --
>>>     Milosz Tanski
>>>     CTO
>>>     10 East 53rd Street, 37th floor
>>>     New York, NY 10022
>>>
>>>     p: 646-253-9055 <tel:646-253-9055>
>>>     e: [email protected] <mailto:[email protected]>
>>>
>>>
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>
> 
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

signature.asc
Description: OpenPGP digital signature

Re: GCC -msse2 portability question

Reply via email to