To obtain best performance Julia uses all resources it knows how to use and 
that are available on the machine it compiles to.  To run on different 
machines the architecture of both has to have the same resources.  So you 
either have to compile to the lowest common denominator of the 
architectures and so possibly missing performance on the machine with more 
resources, or you have to re-compile on both machines. 

The OP doesn't make it clear what version the 2670 chips are, but for 
example a 2680 V3 has different AVX capabilities to a 2670 V2, so code 
compiled for the newer chip may not run on the older one without 
re-compilation.

So its not as simple as just ARM vs Intel.  Advances in the architecture 
happen between two Intel releases as well, and that might prevent binary 
code compatibility.

All languages that compile to machine code have the problem.  Languages 
like Java compile to machine code at runtime, thats fine, but it has a 
performance cost.  You can do that with Julia too for the Julia code, but 
libraries written in other languages still need to be binary compatible.

On Tuesday, November 24, 2015 at 7:49:04 AM UTC+10, Páll Haraldsson wrote:
>
> On Monday, November 23, 2015 at 5:08:16 PM UTC, Yichao Yu wrote:
>>
>> On Mon, Nov 23, 2015 at 11:43 AM, Ian Watson <[email protected]> wrote: 
>> > As far as I know, the underlying O/S and software on both machines is 
>> the 
>> > same, Red Hat Enterprise Linux Server release 6.6 (Santiago), but I 
>> cannot 
>> > be 100% certain about that. I am compiling with gcc-5.2.0 
>> > 
>> > I tried the suggestion in DISTRIBUTING.md, but setting MARCH to either 
>> core2 
>> > or x86-64 failed on the E5-2680 machine - openblas usually fails 
>>
>> Yes, you should build with an architecture that is compatible with all 
>> the ones you want to run on.
>>
>
> I do not see a reason that the architecture "should" be the same (except 
> as he says "but that can wait for another day", of course start with the 
> low hanging fruit). It's not clear to me from the link, if the way to 
> install on heterogeneous is not just supported (because it is not worth it 
> much?), any more, but easily could (seems it could, as it was supported, or 
> was it broken and that is why not longer supported?).
>
> Is it just not worth the effort? Not a large payoff? No need to read 
> further for my speculations..
>
>
> I would think there are a lot of such clusters (maybe a large fraction or 
> majority?). If some parts of clusters are older/slower is that a big 
> drawback (except for the hassle to get different ISAs to work)? [You might 
> need some load balancing? Wouldn't code at least work, with possibly slower 
> nodes dragging down performance, at least never giving wrong results/more 
> race conditions or something?]
>
> I expect the bitness needs to be the same.. (not really much of a 
> problem), and the endianness (or not..?), in theory x86 and ARM etc. could 
> be combined (not a good idea..?), and even to OSes different (maybe again 
> not a good idea and no good reason..(?)).
>
> I just ask out of curiosity, it seems Julia would be ideal to do stuff 
> like this, and say for x86 and ARM, not really possible (or at least a 
> hassle) for the "competition" (e.g. C++). An exception, might be Java/JVM 
> and CLR etc. but at least JVM (both?) do not have multi-dimensional arrays 
> (I'm not sure if JVMs are much of a competition, but IBM did some heroic 
> optimizations to get multidimensional arrays to work fast/er with [their 
> (only I think)] JVM and compiler - I'm not sure if this is used much..).
>
>
> The alternative for established languages is differently compiled 
> executable on each node - might not be out of the question in a source code 
> environment (already done that way?) - and/or executable/libraries 
> dynamically choose different machine code as in e.g.: "openblas defaults to 
> detecting the runtime system and picking the best code to run for that 
> computer." [See above, why did openblas then fail? I guess because of 
> "binutils too old", and shouldn't have otherwise] I wander how commonly 
> code is compiled that way, just for x86-variants (it has a runtime cost.. 
> and I assume not done as fat binaries for x86/ARM and wouldn't scale to 
> more..). 
>
> -- 
> Palli.
>
>

Reply via email to