Everyone is welcome to write 'better' code for the shootout. Its a good way to learn useful techniques, and share them. I will add notes about the benefits that were to be had from using this ugly code.
Ben Lippmeier wrote: > Donald Bruce Stewart wrote: > >>> Ha! I don't think "pure lazy fp" would be the phrase I would choose >>> to describe this code. >>> >>> An example (from fannkuch): >>> >>> t <- IO $ \s -> >>> case readIntOffAddr# p1# 0# s of >>> (# s, p1 #) -> case readIntOffAddr# p1# (n# -# 1#) s of >>> (# s, pn #) -> (# s, not (p1 ==# 0# || pn ==# (n# -# 1#)) #) >> >> Ok, I'll bite ;) This is a very atypical "example", as fannkuch is the >> only entry written this way (and it could be rewritten, with careful >> attention to the Core). There are also many lovely pure, lazy entries. > > There are two Haskell entries for fannkuch. The "normal" one has this code: > rotate 2 (x1:x2:xs) = x2:x1:xs > rotate 3 (x1:x2:x3:xs) = x2:x3:x1:xs > rotate 4 (x1:x2:x3:x4:xs) = x2:x3:x4:x1:xs > rotate 5 (x1:x2:x3:x4:x5:xs) = x2:x3:x4:x5:x1:xs > rotate 6 (x1:x2:x3:x4:x5:x6:xs) = x2:x3:x4:x5:x6:x1:xs > rotate 7 (x1:x2:x3:x4:x5:x6:x7:xs) = x2:x3:x4:x5:x6:x7:x1:xs > rotate 8 (x1:x2:x3:x4:x5:x6:x7:x8:xs) = x2:x3:x4:x5:x6:x7:x8:x1:xs > rotate 9 (x1:x2:x3:x4:x5:x6:x7:x8:x9:xs) = x2:x3:x4:x5:x6:x7:x8:x9:x1:xs > rotate 10 (x1:x2:x3:x4:x5:x6:x7:x8:x9:x10:xs) = > x2:x3:x4:x5:x6:x7:x8:x9:x10:x1:xs > rotate n (x:xs) = rotate' n xs > where rotate' 1 xs = x:xs > rotate' n (x:xs) = x:rotate' (n-1) xs Which does not scale particularly well. The "normal" entry is 9.0x slower than the leader, which the readIntOffAddr# is 3.1x slower. So the gain from explicit unboxing and state passing was substantial. > Not so atypical. > > More examples (I didn't look /that/ hard.. :)) > > ----------------------------------- > * From reverse-complement > > reverseit iubc strand i l s = > if i >=# l > then (# s, (I# i, I# l) #) > else case readWord8OffAddr# strand i s of { (# s, c #) -> > case readWord8OffAddr# strand l s of { (# s, x #) -> > case readWord8OffAddr# iubc (word2Int# x) s of { (# s, y#) > case readWord8OffAddr# iubc (word2Int# c) s of { (# s, z #) -> > case writeWord8OffAddr# strand i y s of { s -> > case writeWord8OffAddr# strand l z s of { s -> > reverseit iubc strand (i +# 1#) (l -# 1#) s > } } } } } } Again, there is a "normal" entry, with a screen full of the definition of > complement :: Base -> Base > complement 'A' = 'T' > complement 'a' = 'T' > complement 'C' = 'G' > complement 'c' = 'G' > complement 'G' = 'C' > ... The "normal" entry was 32x slower than the leader and used 31x the memory. The optimized entry runs 6.3x slower and uses 1.1x the memory. The fast entry uses only the state & unboxing seen above, the rest of the program is normal Haskell, e.g. the one line that replaces a screen full: > pairs = map (c2w *** c2w) > [('A','T'),('C','G'),('B','V'),('D','H'),('K','M'),('R','Y'),('\0','\0')] > > > * From k-nucleotide. > > eqmem i ptr1 ptr2 s = if i ==# 0# then (# s , True #) else > case readInt8OffAddr# ptr1 0# s of { (# s, i8a #) -> > case readInt8OffAddr# ptr2 0# s of { (# s, i8b #) -> > if i8a ==# i8b > then eqmem (i -# 1#) (plusAddr# ptr1 1#) (plusAddr# ptr2 1#) s > else (# s, False #) } } > I wrote that, copying the technique from Don. That is what a fast packed string library which compares ascii bytes should do internally. In c-code it is exacly the 'memcmp' function. And yet the GHC 6.4.1 distribution does not have this function. This is an example of the library missing things that it should have. The "normal" Haskell entry was 26x slower and 29x bigger than the leader. The replacement code above is 22x and 7.1x instead. Note that I did *not* get a big speed increase. This is because I must use the provided Data.Hashtable which is simply not competitive to other languages. This shootout benchmark has highlighted the deficiency. > > * From n-body. > > kineticE i = let i' = (.|. (shiftL i 3)) > in do m <- mass i > vx <- unsafeRead b (i' vx) > vy <- unsafeRead b (i' vy) > vz <- unsafeRead b (i' vz) > return $! 0.5 * m * (vx*vx + vy*vy + vz*vz) > I wrote that out of desperation. Einar has a previous entry that is beautiful Haskell, creating an "open mutable records" system just for this benchmark. It ran in 18x time, 5.8 space. Many of us put entries on the wiki which were different ways to approach it (arrays of mutable data; mutable arrays of constant data). None were able to beat Einar's entry. Eventually I and Don wrote a flat ptr to array of double version, doing it's own offset computation using (shiftL) and (.!.) as you see above. This runs in 6.6x time and 4.1x space. So without resorting to state & unboxing we got nearly triple the speed. Though we are still half the speed of OCaml here. > ---------------------- > > *I* am certainly not implying that Haskell is anything less than the > most wonderous language in the entire world. The neat thing is that for speed we do not drop out into c-code or assembly. We just use different parts of Haskell. The mutable state for the n-body code is simple using the IO features to act on a ptr to machine doubles. > > I'm saying that there's a stark difference in style between the programs > submitted to the shootout, and the ones I would show to people that I > myself was trying to introduce to the wonders of purely lazy functional > programming. :). True. The ugly code above is not wondrous. I will assert that the use of explicit-IO-state passing and unboxed code is good for teaching. If you want to explain the IO monad, you will end up talking about the state of the "real world" and how nice it is that you don't have to pass it around so that pure functions can peek/poke at it. By showing someone the examples above, they get to read what their IO-do-notation means once it has been de-sugared to a lower level than (>>=). Another way to look at it is that "readIntOffAddr# p1# 0# s" is making use of a nominally pure function, where peekPtr / pokePtr are impure. > > I think there's a big-fat-lesson about the tension between abstraction > and implementation in these entries. > > On one hand we've got "This is what I want", on the other it's "What do > I have to do to implement it". > > Ben. You have the truth of it. Haskell wins the lines-of-code metric by a large margin from the abstraction. But entries that run into big performance problems get parts rewritten in lower level Haskell. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe