You couldn't really preserve the semantics as Julia is a much more dynamic 
language.  ISPC can do what it does because the kernel language is fairly 
restrictive.

On Wednesday, September 24, 2014 11:30:56 AM UTC-4, Sebastian Good wrote:
>
> ... though I suspect to really profit from masked vectorization like this, 
> it needs to be tackled at a much lower level in the compiler, likely even 
> as an LLVM optimization pass, guided only by some hints from Julia itself.
>
> *Sebastian Good*
>
>  
> On Wed, Sep 24, 2014 at 10:16 AM, Sebastian Good <
> [email protected] <javascript:>> wrote:
>
>> I've been thinking about this a bit, and as usual, Julia's multiple 
>> dispatch might make such a thing possible in a novel way. The heart of ISPC 
>> is allowing a function that looks like
>>
>> int addScalar (int a, int b) { return a + b; }
>>
>> effectively be
>>
>> vector<int> addVector (vector<int> a, vector<int> b) { return /*AVX 
>> version of */a + b; }
>>
>> This is what vectorizing compilers do, but they don't handle control flow 
>> like ISPC does. Also, ISPCs "foreach" and "foreach_tiled" allow these 
>> vectorized functions to be consumed more efficiently, for instance by 
>> handling the ragged/unaligned front and back of arrays with scalar 
>> versions, and the middle bits with vectorized functions.
>>
>> With support for hardware vectors in Julia, you can start to imagine 
>> writing macros that automatically generate the relevant functions, e.g. 
>> generating AddVector from addScalar. However, to do anything cleverer than 
>> the (already extremely clever) LLVM vectorizer, you have to expose masking 
>> operations. To handle incoherent/divergent control flow, you issue vector 
>> operations that are masked, allowing some lanes of the vector to stop 
>> participating in the program for a period.  In a contrived example
>>
>> int addScalar(int a, int b) { return a % 2 ? a + b : a - b; }
>>
>> would be turned into something like the below
>>
>> vector<int> addVector(vector<int> a, vector<int> b) {
>>   mask = all; // a register with all 1s, indicating all lanes participate
>>   int mod = a % 2; // vectorized, using mask
>>   mask = maskwhere(mod != 0);
>>   vector<int> result = a + b; // vectorized, using mask
>>   mask = invert(mask);
>>   result = a - b; // vectorized, using mask
>>   return result;
>> }
>>
>> If you look at it closely, you've got versions generated for each 
>> function that are
>> - scalar
>> - vector-enabled, but for arbitrary length vectors
>> - specialized for (one or more hardware) vector sizes
>> - specialized by alignment (as vector sizes get bigger, e.g. the 32- and 
>> 64-byte AVX versions coming out, you can't just rely on the runtime to 
>> align everything properly, it will be too wasteful)
>>
>> So, I think it's a big ask, but I think it could be produced 
>> incrementally. We'd need help from the Julia language/standard library 
>> itself to expose masked vector operations.  
>>
>>
>> *Sebastian Good*
>>
>>  
>> On Tue, Sep 23, 2014 at 2:52 PM, Jeff Waller <[email protected] 
>> <javascript:>> wrote:
>>
>>> Could this theoretical thing be approached incrementally?  Meaning 
>>> here's a project and he's some intermediate results and now it's 1.5x 
>>> faster, and now he's something better and it's 2.7 all the while the goal 
>>> is apparent but difficult.  
>>>
>>> Or would it kind of be all works or doesn't?
>>>
>>
>>
>

Reply via email to