You couldn't really preserve the semantics as Julia is a much more dynamic language. ISPC can do what it does because the kernel language is fairly restrictive.
On Wednesday, September 24, 2014 11:30:56 AM UTC-4, Sebastian Good wrote: > > ... though I suspect to really profit from masked vectorization like this, > it needs to be tackled at a much lower level in the compiler, likely even > as an LLVM optimization pass, guided only by some hints from Julia itself. > > *Sebastian Good* > > > On Wed, Sep 24, 2014 at 10:16 AM, Sebastian Good < > [email protected] <javascript:>> wrote: > >> I've been thinking about this a bit, and as usual, Julia's multiple >> dispatch might make such a thing possible in a novel way. The heart of ISPC >> is allowing a function that looks like >> >> int addScalar (int a, int b) { return a + b; } >> >> effectively be >> >> vector<int> addVector (vector<int> a, vector<int> b) { return /*AVX >> version of */a + b; } >> >> This is what vectorizing compilers do, but they don't handle control flow >> like ISPC does. Also, ISPCs "foreach" and "foreach_tiled" allow these >> vectorized functions to be consumed more efficiently, for instance by >> handling the ragged/unaligned front and back of arrays with scalar >> versions, and the middle bits with vectorized functions. >> >> With support for hardware vectors in Julia, you can start to imagine >> writing macros that automatically generate the relevant functions, e.g. >> generating AddVector from addScalar. However, to do anything cleverer than >> the (already extremely clever) LLVM vectorizer, you have to expose masking >> operations. To handle incoherent/divergent control flow, you issue vector >> operations that are masked, allowing some lanes of the vector to stop >> participating in the program for a period. In a contrived example >> >> int addScalar(int a, int b) { return a % 2 ? a + b : a - b; } >> >> would be turned into something like the below >> >> vector<int> addVector(vector<int> a, vector<int> b) { >> mask = all; // a register with all 1s, indicating all lanes participate >> int mod = a % 2; // vectorized, using mask >> mask = maskwhere(mod != 0); >> vector<int> result = a + b; // vectorized, using mask >> mask = invert(mask); >> result = a - b; // vectorized, using mask >> return result; >> } >> >> If you look at it closely, you've got versions generated for each >> function that are >> - scalar >> - vector-enabled, but for arbitrary length vectors >> - specialized for (one or more hardware) vector sizes >> - specialized by alignment (as vector sizes get bigger, e.g. the 32- and >> 64-byte AVX versions coming out, you can't just rely on the runtime to >> align everything properly, it will be too wasteful) >> >> So, I think it's a big ask, but I think it could be produced >> incrementally. We'd need help from the Julia language/standard library >> itself to expose masked vector operations. >> >> >> *Sebastian Good* >> >> >> On Tue, Sep 23, 2014 at 2:52 PM, Jeff Waller <[email protected] >> <javascript:>> wrote: >> >>> Could this theoretical thing be approached incrementally? Meaning >>> here's a project and he's some intermediate results and now it's 1.5x >>> faster, and now he's something better and it's 2.7 all the while the goal >>> is apparent but difficult. >>> >>> Or would it kind of be all works or doesn't? >>> >> >> >
