Fwd: Re: [Haskell-cafe] speed: ghc vs gcc

2009-02-21 Thread Khudyakov Alexey
Oh I've again sent mail to wrong address
--  Forwarded Message  --

On Saturday 21 February 2009 02:42:11 you wrote:
 On Sat, Feb 21, 2009 at 12:22 AM, Bulat Ziganshin
 bulat.zigans...@gmail.com

  wrote:
 
  Hello Khudyakov,
 
  Saturday, February 21, 2009, 2:07:39 AM, you wrote:
   I have another question. Why shouldn't compiler realize that `sum
 
  [1..10^9]'
 
   is constant and thus evaluate it at compile time?
 
  since we expect that compilation will be done in reasonable amount of
  time. you cannot guarantee this for list-involving computation

 it would be nice to have a compiler that can run forever, incrementally
 generating faster and faster versions of the same program, until you press
 a key or a timeout is reached.

 then you just let it run before you get to bed ;-)

 you could even pass it in a test data set to which it must be optimized;
 after the program is compiled, the compiler runs and profiles it, measures
 the results, and does another pass to make it faster.

I've just remembered another but related approach to optimization. It uses 
genetic algorithm to determine close to the best set of optimization options. 
Alternatively it could be used to find badly interacting options, 
pessimizations. 

Implementation for gcc is here: 
http://www.coyotegulch.com/products/acovea/

In fact I didn't tried it but I liked the idea. 

--
  Khudaykov Alexey

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Fwd: Re: [Haskell-cafe] speed: ghc vs gcc

2009-02-21 Thread Bulat Ziganshin
Hello Khudyakov,

Sunday, February 22, 2009, 12:58:59 AM, you wrote:

 you could even pass it in a test data set to which it must be optimized;
 after the program is compiled, the compiler runs and profiles it, measures
 the results, and does another pass to make it faster.

it supported in gcc4 and icl at least

 I've just remembered another but related approach to optimization. It uses
 genetic algorithm to determine close to the best set of optimization options.

afaik it used widely for tuning parameters of compression algorithms


-- 
Best regards,
 Bulatmailto:bulat.zigans...@gmail.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] speed: ghc vs gcc

2009-02-20 Thread Miguel Mitrofanov

Ahem. Seems like you've included time spent on the runtime loading.

My results:

MigMit:~ MigMit$ gcc -o test -O3 -funroll-loops test.c  time ./test
-1243309312
real0m0.066s
user0m0.063s
sys 0m0.002s
MigMit:~ MigMit$ rm test; ghc -O2 --make test.hs  time ./test
Linking test ...
-243309312

real0m3.201s
user0m3.165s
sys 0m0.017s

While 3.201 vs. 0.066 seem to be a huge difference, 0.017 vs. 0.002 is  
not that bad.


On 20 Feb 2009, at 16:29, Bulat Ziganshin wrote:


Hello haskell-cafe,

since there are no objective tests comparing ghc to gcc, i made my own
one. these are 3 programs, calculating sum in c++ and haskell:

main = print $ sum[1..10^9::Int]


main = print $ sum0 (10^9) 0

sum0 :: Int - Int - Int
sum0 0  !acc = acc
sum0 !x !acc = sum0 (x-1) (acc+x)


main()
{
 int sum=0;
 //for(int j=0; j100;j++)
   for(int i=0; i1000*1000*1000;i++)
 sum += i;
 return sum;
}

execution times:
sum:
  ghc 6.6.1 -O2   : 12.433 secs
  ghc 6.10.1 -O2  : 12.792 secs
sum-fast:
  ghc 6.6.1 -O2   :  1.919 secs
  ghc 6.10.1 -O2  :  1.856 secs
  ghc 6.10.1 -O2 -fvia-C  :  1.966 secs
C++:
  gcc 3.4.5 -O3 -funroll-loops:  0.062 secs


--
Best regards,
Bulat  mailto:bulat.zigans...@gmail.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] speed: ghc vs gcc

2009-02-20 Thread Miguel Mitrofanov

Forget it, my bad.

On 20 Feb 2009, at 16:48, Miguel Mitrofanov wrote:


Ahem. Seems like you've included time spent on the runtime loading.

My results:

MigMit:~ MigMit$ gcc -o test -O3 -funroll-loops test.c  time ./test
-1243309312
real0m0.066s
user0m0.063s
sys 0m0.002s
MigMit:~ MigMit$ rm test; ghc -O2 --make test.hs  time ./test
Linking test ...
-243309312

real0m3.201s
user0m3.165s
sys 0m0.017s

While 3.201 vs. 0.066 seem to be a huge difference, 0.017 vs. 0.002  
is not that bad.


On 20 Feb 2009, at 16:29, Bulat Ziganshin wrote:


Hello haskell-cafe,

since there are no objective tests comparing ghc to gcc, i made my  
own

one. these are 3 programs, calculating sum in c++ and haskell:

main = print $ sum[1..10^9::Int]


main = print $ sum0 (10^9) 0

sum0 :: Int - Int - Int
sum0 0  !acc = acc
sum0 !x !acc = sum0 (x-1) (acc+x)


main()
{
int sum=0;
//for(int j=0; j100;j++)
  for(int i=0; i1000*1000*1000;i++)
sum += i;
return sum;
}

execution times:
sum:
 ghc 6.6.1 -O2   : 12.433 secs
 ghc 6.10.1 -O2  : 12.792 secs
sum-fast:
 ghc 6.6.1 -O2   :  1.919 secs
 ghc 6.10.1 -O2  :  1.856 secs
 ghc 6.10.1 -O2 -fvia-C  :  1.966 secs
C++:
 gcc 3.4.5 -O3 -funroll-loops:  0.062 secs


--
Best regards,
Bulat  mailto:bulat.zigans...@gmail.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] speed: ghc vs gcc

2009-02-20 Thread Dan Doel
 Test.hs 

import Prelude hiding (sum, enumFromTo)

import Data.List.Stream (sum, unfoldr)

enumFromTo m n = unfoldr f m
 where f k | k = n= Just (k,k+1)
   | otherwise = Nothing

main = print . sum $ enumFromTo 1 (10^9 :: Int)

 snip 

do...@zeke % time ./Test
 
55
./Test  3.12s user 0.03s system 80% cpu 3.922 total
do...@zeke % time ./Test-sum0   
 
55
./Test-sum0  3.47s user 0.02s system 80% cpu 4.348 total
do...@zeke % time ./Test-sum0   
 
55
./Test-sum0  3.60s user 0.02s system 90% cpu 4.009 total
do...@zeke % time ./Test
 
55
./Test  3.11s user 0.02s system 81% cpu 3.846 total

 snip 

Test-sum0 is with the sum0 function

Test is the code at the top of this mail.

-fvia-c -optc-O3 didn't seem to make a big difference with either Haskell 
example, so they're both with the default backend.

Your C++ code runs slowly on my system (around 1 second), but that's because 
it uses 32-bit ints, I guess (switching to long int sped it up).

-- Dan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] speed: ghc vs gcc

2009-02-20 Thread Dan Doel
Sorry for replying to myself, but I got suspicious about the 6ms runtime of 
the 64-bit C++ code on my machine. So I looked at the assembly and found this:

.LCFI1: 

movabsq $45, %rsi   

movl$_ZSt4cout, %edi

pushq   %r12

I'm no assembly guru, but that makes me think that there's no actual 
computation going on in the runtime for the 64-bit C++ program, whereas the 
32-bit one is clearly doing work on my system, since it takes around 1 second.

Not that I'd be sad if GHC could reduce that whole constant at compile time, 
but GCC isn't doing 1 billion adds in 6 (or even 60) milliseconds.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] speed: ghc vs gcc

2009-02-20 Thread David Leimbach
On Fri, Feb 20, 2009 at 6:39 AM, Dan Doel dan.d...@gmail.com wrote:

 Sorry for replying to myself, but I got suspicious about the 6ms runtime of
 the 64-bit C++ code on my machine. So I looked at the assembly and found
 this:

.LCFI1:
movabsq $45, %rsi
movl$_ZSt4cout, %edi
pushq   %r12

 I'm no assembly guru, but that makes me think that there's no actual
 computation going on in the runtime for the 64-bit C++ program, whereas the
 32-bit one is clearly doing work on my system, since it takes around 1
 second.

 Not that I'd be sad if GHC could reduce that whole constant at compile
 time,
 but GCC isn't doing 1 billion adds in 6 (or even 60) milliseconds.


The GCC optimizer must know that you can't return a value to user space of
that large as a return result.

In Haskell you're printing it... why not print it in C++?



 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] speed: ghc vs gcc

2009-02-20 Thread Dan Doel
On Friday 20 February 2009 10:52:03 am David Leimbach wrote:
 The GCC optimizer must know that you can't return a value to user space of
 that large as a return result.

 In Haskell you're printing it... why not print it in C++?

I actually changed my local copy to print out the result (since I wanted to 
make sure it was using 64 bit ints). It didn't make a difference in the timing 
(of either the 32 or 64 bit version).
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] speed: ghc vs gcc

2009-02-20 Thread Don Stewart
bulat.ziganshin:
 Hello haskell-cafe,
 
 since there are no objective tests comparing ghc to gcc, i made my own
 one. these are 3 programs, calculating sum in c++ and haskell:

Wonderful. Thank you!
  
 main = print $ sum[1..10^9::Int]


This won't be comparable to your loop below, as 'sum' is a left fold
(which doesn't fuse under build/foldr).

You should use the list implementation from the stream-fusion package (or
uvector) if you're expecting it to fuse to the following loop:
  
 main = print $ sum0 (10^9) 0
 
 sum0 :: Int - Int - Int
 sum0 0  !acc = acc
 sum0 !x !acc = sum0 (x-1) (acc+x)


Note the bang patterns aren't required here. It compiles to the
following core:

$wsum0 :: Int# - Int# - Int#
$wsum0 =
  \ (ww_sON :: Int#) (ww1_sOR :: Int#) -
  case ww_sON of ds_XD0 {
_ - $wsum0 (-# ds_XD0 1) (+# ww1_sOR ds_XD0);
0 - ww1_sOR

which is perfect.

Main_zdwsum0_info:
  testq   %rsi, %rsi
  movq%rsi, %rax
  jne .L2
  movq%rdi, %rbx
  jmp *(%rbp)
.L2:
  leaq-1(%rsi), %rsi
  addq%rax, %rdi
  jmp Main_zdwsum0_info

Which seems ... OK.

$ ghc-core A.hs -fvia-C -optc-O3
$ time ./A
55
./A  1.12s user 0.00s system 99% cpu 1.127 total
  
Works for me. That's on linux x86_64, gcc 4.4

Trying -fasm:

Main_zdwsum0_info:
.LcQs:
  movq %rsi,%rax
  testq %rax,%rax
  jne .LcQw
  movq %rdi,%rbx
  jmp *(%rbp)
.LcQw:
  movq %rdi,%rcx
  addq %rax,%rcx
  leaq -1(%rax),%rsi
  movq %rcx,%rdi
  jmp Main_zdwsum0_info

$ time ./A
55
./A  1.65s user 0.00s system 98% cpu 1.677 total

Is  a bit slower.

 main()
 {
   int sum=0;
   //for(int j=0; j100;j++)
 for(int i=0; i1000*1000*1000;i++)
   sum += i;
   return sum;
 }


Well, that's a bit different. It doesn't print the result, and it returns a 
different
results on 64 bit


$ gcc -O0 t.c
$ time ./a.out 
-1243309312
./a.out  3.99s user 0.00s system 88% cpu 4.500 total

$ gcc -O1 t.c
$ time ./a.out
-1243309312
./a.out  0.88s user 0.00s system 99% cpu 0.892 total

$ gcc -O3 -funroll-loops t.c 
$ time ./a.out
-1243309312
./a.out  0.31s user 0.00s system 97% cpu 0.318 total

I don't get anything near the 0.062s which is interesting.
The print statement slows things down, I guess...

So we have:

ghc -fvia-C -O2 1.127
ghc -fasm   1.677
gcc -O0 4.500
gcc -O3 -funroll-loops  0.318

So. some lessons. GHC is around 3-4x slower on this tight loop. (Which isn't as
bad as it used to be).

That's actually a worse margin than any current shootout program, where we are 
no 
worse than 2.9 slower on larger things:


http://shootout.alioth.debian.org/u64q/benchmark.php?test=alllang=ghclang2=gccbox=1

 
 execution times:
  sum:
ghc 6.6.1 -O2   : 12.433 secs
ghc 6.10.1 -O2  : 12.792 secs
  sum-fast:
ghc 6.6.1 -O2   :  1.919 secs
ghc 6.10.1 -O2  :  1.856 secs
ghc 6.10.1 -O2 -fvia-C  :  1.966 secs
  C++:
gcc 3.4.5 -O3 -funroll-loops:  0.062 secs
 

I couldn't reproduce your final number. 

Now, given GHC gets most of the way there -- I think this might make a good bug
report against GHC head, so we can see if the new register allocator helps any.

http://hackage.haskell.org/trac/ghc/newticket?type=bug

Thanks for the report, Bulat!

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] speed: ghc vs gcc

2009-02-20 Thread Don Stewart
bulat.ziganshin:
 Friday, February 20, 2009, 7:41:33 PM, you wrote:
 
  main = print $ sum[1..10^9::Int]
 
  This won't be comparable to your loop below, as 'sum' is a left fold
  (which doesn't fuse under build/foldr).
 
  You should use the list implementation from the stream-fusion package (or
  uvector) if you're expecting it to fuse to the following loop:
 
 it was comparison of native haskell, low-level haskell (which is
 harder to write than native C) and native C. stream-fusion and any
 other packages provides libraries for some tasks but they can't make faster
 maps, for example. so i used plain list


Hmm? Maybe you're not familiar with the state of the art?

$ cabal install uvector

Write a loop at a high level:

import Data.Array.Vector

main = print (sumU (enumFromToU 1 (10^9 :: Int)))
   
Compile it:

$ ghc-core A.hs -O2 -fvia-C -optc-O3

Yielding:

s16h_info:
  cmpq6(%rbx), %rdi
  jg  .L2
  addq%rdi, %rsi
  leaq1(%rdi), %rdi
  jmp s16h_info

Running:

$ time ./A
55
./A  0.97s user 0.01s system 99% cpu 0.982 total


Now, (trying to avoid the baiting...) this is actually *very*
interesting. Why is this faster than the manual recursion we did earlier
why do we get better assembly?  Again, if you stick to specifics, there's some
interesting things we can learn here.

  
  Which seems ... OK.
 
 really? :D

No, see above.

  
  I don't get anything near the 0.062s which is interesting.
 
 it was beautiful gcc optimization - it added 8 values at once. with
 xor results are:
 
 xor.hs  12.605
 xor-fast.hs  1.856
 xor.cpp  0.339


GCC is a good loop optimiser. But apparently not my GCC.

  
  So we have:
 
  ghc -fvia-C -O2 1.127
  ghc -fasm   1.677
  gcc -O0 4.500
  gcc -O3 -funroll-loops  0.318
 
 why not compare to ghc -O0? also you can disable loop unrolling in gcc
 and unroll loops manually in haskell. or you can generate asm code on
 the fly. there are plenty of tricks to prove that gcc generates bad
 code :D


No, we want to show (I imagine) that GHC is within a factor or two of C.
I usually set my benchmark to beat gcc -O0 fwiw, and then to hope to be within
2x of optimised C. I'm not sure what you're standards are.

  
  So. some lessons. GHC is around 3-4x slower on this tight loop. (Which 
  isn't as
  bad as it used to be).
 
 really? what i see: low-level haskell code is usually 3 times harder
 to write and 3 times slower than gcc code. native haskell code is tens
 to thousands times slower than C code (just recall that real programs
 use type classes and monads in addition to laziness)


thousands times, now you're just undermining your own credibility
here. Stick to what you can measure. If anything we'd expect GCC's magic loop
skillz to be less useful on large code bases.

  
  That's actually a worse margin than any current shootout program, where we 
  are no
  worse than 2.9 slower on larger things:
 
 1) most benchmarks there depend on libraries speed. in one test, for
 example, php is winner
 2) for the sum program ghc libs was modified to win in benchmark


It is interesting that the  2.9x slower in the shootout is pretty much what
we found in this benchmark too. 

 3) the remaining 1 or 2 programs that measure speed of ghc-generated
 code was hardly optimized using low-level code, so they don't have
 anything common with real haskell code most of us write every day


Depends on where you work.

  
  Now, given GHC gets most of the way there -- I think this might make a good 
  bug
  report against GHC head, so we can see if the new register allocator helps 
  any.
 
 you mean that 6.11 includes new allocator? in that case you can
 test it too

Yes.


http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/IntegratedCodeGen
  

 i believe that ghc developers are able to test sum performance without my
 bugreports :D

No! This is not how open source works! You *should submit bug reports* and 
*analysis*.
It is so so much more useful than complaining and throwing stones.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] speed: ghc vs gcc

2009-02-20 Thread Daniel Fischer
Am Freitag, 20. Februar 2009 18:10 schrieb Bulat Ziganshin:
 Hello Don,

 Friday, February 20, 2009, 7:41:33 PM, you wrote:
  main = print $ sum[1..10^9::Int]
 
  This won't be comparable to your loop below, as 'sum' is a left fold
  (which doesn't fuse under build/foldr).
 
  You should use the list implementation from the stream-fusion package (or
  uvector) if you're expecting it to fuse to the following loop:

 it was comparison of native haskell, low-level haskell (which is
 harder to write than native C)

Um, not always, and certainly not in cases like this, at least if you call the 
simple worker loop low-level Haskell.

 and native C. stream-fusion and any
 other packages provides libraries for some tasks but they can't make faster
 maps, for example. so i used plain list

Which is of course comparable with a tight loop in C, isn't it?
Really, you hurt your cause by including that.
You said you wanted to compare ghc to gcc, then compare what they do to 
comparable code.


  Which seems ... OK.

 really? :D

  Well, that's a bit different. It doesn't print the result, and it returns
  a different results on 64 bit

 doesn't matter for testing speed

  I don't get anything near the 0.062s which is interesting.

 it was beautiful gcc optimization - it added 8 values at once. with
 xor results are:

 xor.hs  12.605
 xor-fast.hs  1.856
 xor.cpp  0.339

  The print statement slows things down, I guess...

 are you really believe that printing one number needs so much time? :)

  So we have:
 
  ghc -fvia-C -O2 1.127
  ghc -fasm   1.677
  gcc -O0 4.500
  gcc -O3 -funroll-loops  0.318

 why not compare to ghc -O0? also you can disable loop unrolling in gcc
 and unroll loops manually in haskell. or you can generate asm code on
 the fly. there are plenty of tricks to prove that gcc generates bad
 code :D

That's not what he's doing at all. Sure, he's not comparing Haskell code 
compiled without optimisations, but he also includes gcc with highest 
optimisation level. Read the gcc -O0 figure as an indication of what 
optimisations can achieve.


  So. some lessons. GHC is around 3-4x slower on this tight loop. (Which
  isn't as bad as it used to be).

 really? what i see: low-level haskell code is usually 3 times harder
 to write and 3 times slower than gcc code.

I deny that low-level Haskell code is three times harder to write than 
ordinary C code, at least if we consider the worker/wrapper idiom low-level 
Haskell.
It is also my experience that gcc usually creates faster executables from good 
C code than ghc does from corresponding ordinary Haskell code (not using 
#-magic), but the margin does vary wildly.

 native haskell code is tens
 to thousands times slower than C code (just recall that real programs
 use type classes and monads in addition to laziness)

Okay, tens is realistic, but thousands?
Of course if you compare a tight loop that doesn't allocate to creating 
thousands of millions of cons-cells...
Just because lists are easier to use in Haskell than in any other language I 
know doesn't mean it's necessary to use lists when writing Haskell if other 
ways are more fitting for the goal.

Just for the record, timings on my machine, gcc-3.3 vs. ghc-6.8.3:
$ ./runtests
Sums in C, first counting up, then down
with -O0
-243309312

real0m6.751s
user0m6.660s
sys 0m0.020s
-243309312

real0m6.318s
user0m6.190s
sys 0m0.000s
with -O1
-243309312

real0m2.533s
user0m2.530s
sys 0m0.010s
-243309312

real0m1.744s
user0m1.700s
sys 0m0.000s
with -O2
-243309312

real0m1.744s
user0m1.710s
sys 0m0.000s
-243309312

real0m1.687s
user0m1.680s
sys 0m0.000s
with -O3
-243309312

real0m1.753s
user0m1.720s
sys 0m0.000s
-243309312

real0m1.701s
user0m1.700s
sys 0m0.000s
Sums in Haskell
First compiled with -O2, then with -O2 -fvia-C -optc-O3
Using uvector
-243309312

real0m7.412s
user0m7.290s
sys 0m0.000s
-243309312

real0m5.726s
user0m5.650s
sys 0m0.000s
Loop down with BangPatterns
-243309312

real0m4.789s
user0m4.750s
sys 0m0.010s
-243309312

real0m4.561s
user0m4.470s
sys 0m0.000s
Loop down without BangPatterns
-243309312

real0m5.092s
user0m4.890s
sys 0m0.000s
-243309312

real0m4.747s
user0m4.540s
sys 0m0.010s
Loop up (with BangPatterns)
-243309312

real0m5.511s
user0m5.320s
sys 0m0.000s
-243309312

real0m4.449s
user0m4.410s
sys 0m0.000s
Using strict left fold
-243309312

real2m45.625s
user2m41.930s
sys 0m0.260s
-243309312

real2m43.890s
user2m41.550s
sys 0m0.280s
Fully naive
-243309312

real2m45.657s
user2m42.980s
sys 0m0.250s
-243309312

real2m42.403s
user2m40.160s
sys 0m0.370s
Done

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org

Re: [Haskell-cafe] speed: ghc vs gcc vs jhc

2009-02-20 Thread John Meacham
Don't forget jhc:

on my machine (with 'print' equivalent added to C one to be fair, and
10^9 changed to 1000*1000*1000 just like the C one)

ghc: (-O2)
time ./foo
./foo  2.26s user 0.00s system 99% cpu 2.273 total

gcc: 
time ./a.out
./a.out  0.34s user 0.00s system 99% cpu 0.341 total

jhc:
time ./hs.out
./hs.out  0.33s user 0.00s system 96% cpu 0.347 total

Yay! it is good to see my goal of C-equivalent performance starting to
come true :)

John




-- 
John Meacham - ⑆repetae.net⑆john⑈
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] speed: ghc vs gcc vs jhc

2009-02-20 Thread Ketil Malde
Bulat Ziganshin bulat.zigans...@gmail.com writes:

 Don't forget jhc:

 i was pretty sure that jhc will be as fast as gcc :) unfortunately,
 jhc isn't our production compiler

Neither is GCC :-)

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] speed: ghc vs gcc

2009-02-20 Thread Khudyakov Alexey
On Friday 20 February 2009 16:29:29 Bulat Ziganshin wrote:
 Hello haskell-cafe,

 since there are no objective tests comparing ghc to gcc, i made my own
 one. these are 3 programs, calculating sum in c++ and haskell:

 main = print $ sum[1..10^9::Int]

 ... skipped ...

The discussion is mostly about low level optimizations such as loop unrolling 
etc. 

I have another question. Why shouldn't compiler realize that `sum [1..10^9]' 
is constant and thus evaluate it at compile time? 

-- 
  Khudakov Alexey
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] speed: ghc vs gcc vs jhc

2009-02-20 Thread John Meacham
On Sat, Feb 21, 2009 at 02:24:59AM +0300, Bulat Ziganshin wrote:
 Hello John,
 
 Saturday, February 21, 2009, 2:14:25 AM, you wrote:
 
  Heh. He probably meant something more like jhc is not a production
  compiler which is true for a lot of projects. For projects of
  substantial size or that require many extensions, jhc falls somewhat
  short. It is getting better though. Of course, help is always
  appreciated. :)
 
 what is substantial size? can jhc be used for video codec, i.e.
 probably no extensions - just raw computations, and thousands or tens
 of thousands LOCs?

Perhaps. A bigger issue in practice is that the larger a program is, the
more likely it is to depend on some library that depends on a ghc
extension. However, base is almost 1 lines and jhc can compile that
into a library without too much effort nowadays, so it might scale.
If you try and find it fails, then please submit a bug report to
j...@haskell.org. Too many bugs go unreported I find.

If the haskell code has an interface that is strict and unboxable (i.e.
only unboxable values passed, such as a video codec passing floats might
be) then compiling it with jhc and foreign exporting the functions then
foreign importing them into ghc for the bulk of the program would
probably work. Probably not worth the effort, but could be an
interesting experiment.

JOhn

-- 
John Meacham - ⑆repetae.net⑆john⑈
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] speed: ghc vs gcc

2009-02-20 Thread Xiao-Yong Jin
Peter Verswyvelen bugf...@gmail.com writes:

 you could even pass it in a test data set to which it must be optimized; 
 after the program is compiled,
 the compiler runs and profiles it, measures the results, and does another 
 pass to make it faster.

 some C++ compilers can already do this (profile based optimization).

Rumor says firefox needs profile based optimization to run
faster.  Or it is not a rumor at all.

I guess for all those goodness of haskell, a heavy profile
based optimization should be done and could probably result
in a much faster binary than C++.

Best,
Xiao-Yong
-- 
c/*__o/*
\ * (__
*/\  
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] speed: ghc vs gcc vs jhc

2009-02-20 Thread Alberto G. Corona
John,
please update the section  All is not well in jhc-land because now
things are better isn´t?

2009/2/21 John Meacham j...@repetae.net

 On Sat, Feb 21, 2009 at 02:24:59AM +0300, Bulat Ziganshin wrote:
  Hello John,
 
  Saturday, February 21, 2009, 2:14:25 AM, you wrote:
 
   Heh. He probably meant something more like jhc is not a production
   compiler which is true for a lot of projects. For projects of
   substantial size or that require many extensions, jhc falls somewhat
   short. It is getting better though. Of course, help is always
   appreciated. :)
 
  what is substantial size? can jhc be used for video codec, i.e.
  probably no extensions - just raw computations, and thousands or tens
  of thousands LOCs?

 Perhaps. A bigger issue in practice is that the larger a program is, the
 more likely it is to depend on some library that depends on a ghc
 extension. However, base is almost 1 lines and jhc can compile that
 into a library without too much effort nowadays, so it might scale.
 If you try and find it fails, then please submit a bug report to
 j...@haskell.org. Too many bugs go unreported I find.

 If the haskell code has an interface that is strict and unboxable (i.e.
 only unboxable values passed, such as a video codec passing floats might
 be) then compiling it with jhc and foreign exporting the functions then
 foreign importing them into ghc for the bulk of the program would
 probably work. Probably not worth the effort, but could be an
 interesting experiment.

JOhn

 --
 John Meacham - ⑆repetae.net⑆john⑈
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] speed: ghc vs gcc vs jhc

2009-02-20 Thread John Meacham
On Sat, Feb 21, 2009 at 03:21:03AM +0300, Bulat Ziganshin wrote:
  what is substantial size? can jhc be used for video codec, i.e.
  probably no extensions - just raw computations, and thousands or tens
  of thousands LOCs?
 
  Perhaps. A bigger issue in practice is that the larger a program is, the
  more likely it is to depend on some library that depends on a ghc
  extension.
 
 this is true for *application* code, but for codec you may have lots of
 code that just compute, compute, compute

Yes indeed. If there is code like this out there for haskell, I would
love to add it as a test case for jhc. I don't see a reason it wouldn't
compile to be as fast as C, with the caveat that the strictness analyzer
needs to be able to find all the unboxables. It sometimes needs some
help with well placed 'seq' statements, but that is true of ghc as well.
jhc does suffer a lot more than ghc though when it can't make things strict. 

John

-- 
John Meacham - ⑆repetae.net⑆john⑈
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] speed: ghc vs gcc vs jhc

2009-02-20 Thread John Meacham
On Sat, Feb 21, 2009 at 01:20:14AM +0100, Alberto G. Corona  wrote:
 John,
 please update the section  All is not well in jhc-land because now
 things are better isn´t?

Ah, are you refering to this page?
http://repetae.net/computer/jhc/jhc.shtml
That is just there for historical reasons as my initial announcement. 

more up to date info is

in the manual: http://repetae.net/computer/jhc/manual.html
the becoming a developer page: http://repetae.net/computer/jhc/development.shtml
and the how do i just try it out page: 
http://repetae.net/computer/jhc/building.shtml


I guess it isn't clear that that original document is no longer up to date. I
will put a big warning on it.

John

-- 
John Meacham - ⑆repetae.net⑆john⑈
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] speed: ghc vs gcc vs jhc

2009-02-20 Thread Alberto G. Corona
But it is very misleading. It would be nice to have  a log or
something similar to inform about the current state

://repetae.net/computer/jhc/jhc.shtml
 That is just there for historical reasons as my initial announcement.

 more up to date info is

 in the manual: http://repetae.net/computer/jhc/manual.html
 the becoming a developer page: 
 http://repetae.net/computer/jhc/development.shtml
 and the how do i just try it out page: 
 http://repetae.net/computer/jhc/building.shtml


 I guess it isn't clear that that original document is no longer up to date. I
 will put a big warning on it.

John

 --
 John Meacham - ⑆repetae.net⑆john⑈
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] speed: ghc vs gcc vs jhc

2009-02-20 Thread Don Stewart
bulat.ziganshin:
 Hello John,
 
 Saturday, February 21, 2009, 3:42:24 AM, you wrote:
 
  this is true for *application* code, but for codec you may have lots of
  code that just compute, compute, compute
 
  Yes indeed. If there is code like this out there for haskell, I would
  love to add it as a test case for jhc.
 
 Crypto library has a lot of native haskell code computing hashes and
 encrypting data
 
 hopefully people will show other examples
 
 btw, Galois Cryptol has haskell backend, are you know? with jhс
 compilation it can probably generate as fast code as C backend does.
 it will be very interesting for us and even look as something close to
 production usage. i have crossposted message to Don

That's a very interesting idea. The output from Cryptol is self
contained enough, and simple, numerical code, that JHC probably could
handle it -- it doesn't require extensive libraries or runtime support,
for example. This warrents investigation.

Thanks for the suggestion!

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe