Hi, does anyone know if it is possible to make the Free Intel C Compiler work on Red Hat 8.0 ? It used to work on RH7.3 but Jussi L. reported failure on RH8.0 too.
I would just be courious about Intel compiler's efficiency since I am currently performing some resampling / mixing benchmarks using linear and cubic interpolation and for example with cubic interpolation which involves a few MULs/ADDs per sample, I get something like: Celeron (PII-class) 71 cycles/sample , Athlon 61 cycles/sample, Pentium 4 80 cycles / sample. (using gcc 3.2) It seems the P4's FPU sucks quite since for some ops it needs more cycles per instruction than the old Celeron. I heard Intel's compiler is able to use SIMD to speed up FPU ops so I would just be curious to see what is achievable given good quality code that is targeted for the P4. The innermost resampling/mixing loop is very short anyway so in that case one could probably use a P4-specific asm version (or importing the asm generated by icc) in order to achieve maximum performance. But first tests seems to indicate (at least to me) that the Athlon is 20-30% faster for doing resampling/mixing stuff thus I guess an Athlon machine will deliver more voices than a P4 running at the same frequency ( or Pentium-frequency-equivalence-index). Currently I am working only in the floating point domain, but Juan L. is telling me about the wonders of integer math and pointed me to the routines found in http://modplug-xmms.sourceforge.net/ but I am unable to compile the package on RedHat 8.0 (am I a masochist insisting on that distro ? :-) ) On the other hand Steve Harris says that in modern CPUs floating point ops are more accelerated than integer ones and since integer involves a lot of shifting, you might end up with longer execution times than the integer version. (FISTL takes 6 cycles but it is not that much compared to 70-80 cycles in the case of cubic interpolation plus doing all in the FP domain saves you from lots of hassles) I will publish the benchmark and the results when I will have implemented more test cases. Anyway it is interesting to learn how sucky the x86 architecture can be (the hard way of course :-) ). PS: regarding the optimization options Steve H. suggested -O6 -fomit-frame-pointer -fstrength-reduce -funroll-loops -fmove-all-movables -ffast-math -mcpu=i686 -march=i686 Can gcc 3.2 target archtitectures higher than the PII ? (I mean generating P3/P4 specific code ?) thoughts ? Benno -- http://linuxsampler.sourceforge.net Building a professional grade software sampler for Linux. Please help us designing and developing it.
