Beginners Digest, Vol 27, Issue 27

beginners-request Sun, 12 Sep 2010 04:58:04 -0700

Send Beginners mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        [email protected]


You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."


Today's Topics:

   1. Re:  Arrays in Haskell (Gaius Hammond)
   2. Re:  Stack space overflow with foldl' (Ryan Prichard)
   3. Re:  Stack space overflow with foldl' (Daniel Fischer)
   4. Re:  Arrays in Haskell (Henry Olders)
   5. Re:  Arrays in Haskell (Tobias Brandt)
   6.  Again on List Manipulation (Lorenzo Isella)


----------------------------------------------------------------------

Message: 1
Date: Sat, 11 Sep 2010 18:30:46 +0100
From: Gaius Hammond <[email protected]>
Subject: Re: [Haskell-beginners] Arrays in Haskell
To: Henry Olders <[email protected]>
Cc: Lorenzo Isella <[email protected]>,  "[email protected]"
        <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

Well, yes and no. Remember that NumPy (I am also a user) does most of  
its hard work in C. That is why you use a NumPy array and not a normal  
Python array.


It also depends on what you are doing with your data. If you have an  
immutable list and want to map a (pure) function over it, then that  
can be trivially parallelized in Haskell. Iterating over an array  
modifying each element in place is much harder. Remember that lists  
are always fast for sequential read-only access.


Cheers,


G




On 11 Sep 2010, at 17:28, Henry Olders wrote:

> Hi, Lorenzo!
>
> I am also a Haskell beginner, coming from Python, where I also used  
> numpy for speed.
>
> In my python applications, I used lists and list operations  
> (especially list comprehensions) which I found an elegant way of  
> doing things. A numpy array was simply, for my needs, a list of lists.
>
> More experienced users can correct me if I'm wrong, but there are  
> some fundamental differences between Haskell and Python which might  
> influence your choices of lists vs arrays. Haskell, being primarily  
> a compiled language, should give you a speed advantage over  
> interpreted python when you're doing basically the same operations,  
> so you may not need (at least initially) to look for ways to improve  
> speed. My approach basically in any language was to first get the  
> program to work correctly, and to look for speed optimizations only  
> as a secondary step if the program was too slow, and then only for  
> the processes which were the speed bottlenecks.
>
> The second issue is that in Haskell, lists are mutable, whereas  
> arrays are (mostly) immutable. Generally, operations which need to  
> work on every element of a data structure will go faster (and take  
> less memory) if they do not result in the creation of a copy of the  
> data structure, as they need to do for immutable data.
>
> So my tendency would be to look for ways of doing things in Haskell  
> which make use of lists (including lists of lists), and resort to  
> arrays/matrices only for operations which cannot be decomposed into  
> list operations.
>
> Henry
>
>
>
>
> On 2010-09-11, at 10:09 , Lorenzo Isella wrote:
>
>> Dear All,
>> The recent feedback I got on the mailing list led me to think about  
>> the
>> data structure I need for my computations and array manipulations
>> (loosely speaking, let us say that I need indexing and slicing  
>> tables of
>> data).
>> Coming from Python, I am a bit confused: let me say that in my Python
>> scripts I almost never use lists, but rather NumPy arrays.
>> In that case, it is an easy choice (almost every decent software for
>> numerics/visualization etc... in Python relies explicitly or  
>> implicitly
>> on NumPy). On top of that, NumPy is fast, has a lot of inbuilt  
>> functions
>> and interfaces nicely with SciPy, matplotlib, Mayavi2 etc...
>> It seems to me (please correct me if I am mistaken) that the  
>> situation
>> in Haskell is a bit more 'open' to choices.
>> At least I think so when I look at
>>
>> http://en.wikibooks.org/wiki/Haskell/Hierarchical_libraries/Arrays
>>
>> I may want to drop lists at some point for performance reasons but  
>> also
>> because in my work I really have tables of numbers I find  
>> convenient to
>> think of as arrays/matrices (again, loosely speaking, I mean  
>> matrices as
>> arrays + linear algebra like taking the inverse, the determinant  
>> and so on).
>> Bottom line, I would need a data type that
>> (1) is decently fast (OK, there is more than performance to  
>> scripting,
>> but you see my point)
>> (2) allows slicing/indexing (e.g. take 3rd row, second column,  
>> flatten
>> it out, take every element larger than 34 in a row, find its unique
>> elements, sort them etc...) without having to re-invent the wheel
>> myself. This is more on the manipulation side than the linear  
>> algebra.
>> As you can see, I would like to be able to find something similar  
>> to the
>> very useful functions in Data.List for an array.
>> (3) it would be nice if these data type could have either integers of
>> real numbers as entries. If the original data is a made up of integer
>> numbers, conversion to real number is always suspicious (I have  
>> horrible
>> memories (at least in other languages) of numbers that were equal as
>> integers and no longer when read as real numbers because of one last
>> digit changing...which can give you a headache in some cases.
>> (4) again it would be nice if I could feed these arrays/vectors to  
>> tools
>> that take integrals, derivatives, inverse etc...
>>
>> I am considering for this reason several possibilities
>>
>> http://www.haskell.org/tutorial/arrays.html
>> http://users.skynet.be/jyp/html/base/Data-Array.html
>> http://code.haskell.org/hmatrix/hmatrix.pdf
>>
>> hmatrix is probably what I am looking for (linear algebra,  
>> interface to
>> gnuplot for plotting, ODE solver etc...) but there is no  
>> possibility of
>> using arrays of integers, so I am concerned about using it to read  
>> and
>> compare data files filled with integers where I check if certain  
>> entries
>> are equal or not. Also I wonder if I can find any extra documentation
>> other than the well written tutorial (which explains a lot, but  
>> cannot
>> do everything in less than 30 pages and I would have plenty of  
>> questions
>> about array manipulations there).
>>
>> Any suggestion/clarification is more than welcome.
>> Cheers
>>
>> Lorenzo
>>
>>
>>
>> _______________________________________________
>> Beginners mailing list
>> [email protected]
>> http://www.haskell.org/mailman/listinfo/beginners
>
> _______________________________________________
> Beginners mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/beginners



------------------------------

Message: 2
Date: Sat, 11 Sep 2010 14:30:50 -0700
From: Ryan Prichard <[email protected]>
Subject: Re: [Haskell-beginners] Stack space overflow with foldl'
To: Daniel Fischer <[email protected]>
Cc: [email protected]
Message-ID: <20100911143050.6fb09...@ryan>
Content-Type: text/plain; charset=US-ASCII

Daniel,

Thanks for your help.

On Fri, 10 Sep 2010 16:20:40 +0200
Daniel Fischer <[email protected]> wrote:

> On Friday 10 September 2010 11:29:56, Ryan Prichard wrote:
> > Hi,
> >
> > I see a stack overflow with this code, but I don't understand why.
> 
> Neither do I, really, but ghc is being too clever for its own good -
> or not clever enough.
> 
> >
> > I looked at the Core output with ghc-core, with or without
> > optimizations, and I see a $wlgo recursive function that doesn't
> > appear to end in a tail call.
> 
> Without optimisations, I see a nice tail-recursive lgo inside
> foldl'2, pretty much what one would like to see.

When I ran ghc-core, I used the command-line "ghc-core -- Test.hs".  I
thought it would compile Test.hs without optimizations, but ghc-core
actually has a default string of ghc arguments that includes -O2.  If I
run "ghc-core -- -O0 Test.hs", I also see a nice tail-recursive lgo.

> With optimisations, you get a specialised worker $wlgo:
> 
> Rec {
> Main.$wlgo :: [GHC.Types.Int] -> (##)
> GblId
> [Arity 1
>  NoCafRefs
>  Str: DmdType S]
> Main.$wlgo =
>   \ (w_sms :: [GHC.Types.Int]) ->
>     case case w_sms of _ {
>            [] -> GHC.Unit.();
>            : x_adq xs_adr ->
>              case x_adq of _ { GHC.Types.I# _ ->
>              case Main.$wlgo xs_adr of _ { (# #) -> GHC.Unit.() }
>              }
>          }
>     of _ { () ->
>     GHC.Prim.(##)
>     }
> end Rec }
> 
> and it's here that ghc shows the wrong amount of cleverness.
> 
> What have we? At the types () and [Int], with f = flip seq, the step
> in lgo unfolds to
> 
> lgo z (x:xs)
> ~> let z' = f z x in lgo z' xs
> ~> case f z x of
>       z'@() -> lgo z' xs
> ~> case (case x of { I# _ -> z }) of
>       z'@() -> lgo z' xs
> 
> Now flip seq returns its first argument unless its second argument is
> _|_ and () has only one non-bottom value, so the ()-argument of lgo
> can be eliminated (here, the initial ()-argument is known to be (),
> in general the wrapper checks for _|_ before entering the worker),
> giving
> 
> wlgo :: [Int] -> ()
> wlgo [] = ()
> wlgo (x:xs) =
>   case (case x of { I# _ -> () }) of
>     () -> wlgo xs
> 
> It would be nice if the compiler rewrote the last equation to
> 
> wlgo (x:xs) -> case x of { I# _ -> wlgo xs }
> 
> , but apparently it can't. Still, that's pretty good code.
> Now comes the misplaced cleverness.
> Working with unboxed types is better (faster) than working with boxed 
> types, so let's use unboxed types, giving $wlgo the type
> 
> [Int] -> (##)
> 
> (unboxed 0-tuple). But it wasn't clever enough to directly return (#
> #) in the case of an empty list - that would've allowed the core to be
> 
>  \ (w_sms :: [GHC.Types.Int]) ->
>     case w_sms of _ {
>       [] -> GHC.Prim.(##)
>       : x_adq xs_adr ->
>         case x_adq of _ { GHC.Types.I# _ ->
>           Main.$wlgo xs_adr }
>     }
> 
> and all would've been fine and dandy. So it chose [] -> GHC.Unit.()
> and that forced the second branch to also have that type, hence you
> can't have a tail call there but have to box the result of the
> recursive call (only to unbox it again).
> So you get superfluous boxing, unboxing, reboxing in addition to a
> stack- eating recursion.
> 
> But you've hit a rare case here, if you use foldl'2 (flip seq) at a
> type with more than one non-bottom value, the first argument of lgo
> is not eliminated and you don't get the boxing-unboxing dance,
> instead you get a nice tail-recursion, even if foldl'2 is used only
> once and not exported.
> 
> Why GHC does that for (), beats me.
> 
> > I don't see any let expressions in the
> > folding code, so I assume no thunks are being created.  I can make a
> > tail call appear by doing either of two things:
> >
> > 1. Replace "lgo z []" with "lgo !z []".  This suggestion came from
> > an email on haskell-beginners that I can't find right now.  It was
> > a few months ago.
> 
> Perhaps
> http://www.haskell.org/pipermail/beginners/2010-August/005016.html ?

Yes, that was the email.  It was more recent than I thought.

> > 2. Instead of using the __GLASGOW_HASKELL__ version of foldl', use
> > the other version:
> >
> > foldl' f a []     = a
> > foldl' f a (x:xs) = let a' = f a x in a' `seq` foldl' f a' xs
> 
> But that needs to pass also the function, which is generally slower
> than having it as a static argument.
> 
> 3. {-# NOINLINE foldl'2 #-}
> 
> But that's not so good for performance in general.
> 
> Option 1 gives the best Core, but it changes behaviour, with that,
> 
> foldl' (\_ _ -> 1) undefined [0] = 1
> foldl'2 (\_ _ -> 1) undefined [0] = _|_
> 
> To retain the behaviour of foldl',
> 
> foldl'2 f z0 xs0 =
>     case xs0 of
>       [] -> z0
>       (x:xs) -> lgo (f z0 x) xs
>   where
>     lgo !z [] = z
>     lgo z (y:ys) = lgo (f z y) ys

I think I see why foldl'2 and foldl' have the same behavior.

> > My test case is contrived.  Originally, I had a program that read
> > lines from a file as Data.ByteString values, and I was trying to
> > debug a stack overflow.  I added the foldl' call to force the
> > evaluation of the ByteString lines, but the foldl' call itself
> > overflowed the stack.
> 
> That's probably a different matter, foldl' evaluates the accumulation 
> parameter only to weak head normal form, if it's not of simple type,
> it can still contain thunks that will overflow the stack when
> demanded.

I'm using Data.ByteString.ByteString:

Prelude> :info Data.ByteString.ByteString
data Data.ByteString.Internal.ByteString
  = Data.ByteString.Internal.PS !(GHC.ForeignPtr.ForeignPtr
                                    GHC.Word.Word8)
                                !Int
                                !Int

If I evaluate a value of this type into WHNF, I think the fields are
also evaluated into WHNF due to the strictness annotations.  That
should remove thunks from the Int arguments.  I wonder if the
ForeignPtr field could still have a thunk.

I suppose I really wanted an NFData instance for ByteString so I could
use rnf.

I looked at the NFData instance for a list. It's just:

instance NFData a => NFData [a] where
    rnf [] = ()
    rnf (x:xs) = rnf x `seq` rnf xs

If I just want to evaluate each list element to WHNF, I think I could
write:

forceList [] = ()
forceList (x:xs) = x `seq` forceList xs

This function is simpler than foldl'2, and it turns into a
tail-recursive function in Core.  Maybe I could also use 
"forceList = foldr seq ()".

> > I might have fixed my original stack overflow problem.  I was
> > applying sum to a large list of integers, and sum is lazy.
> 
> For [Int] and [Integer], sum is specialised, so when compiled with 
> optimisations, ghc should use a strict version for those types.

I retested my original program that used sum.  It overflows the stack
with -O0, but it doesn't overflow with -O or "-O -fno-strictness". With
-v4 -ddump-simpl-stats -O, I see

1 RuleFired
    1 SPEC Data.List.sum

I don't see this output with -O0 instead of -O.

-Ryan


------------------------------

Message: 3
Date: Sun, 12 Sep 2010 00:54:55 +0200
From: Daniel Fischer <[email protected]>
Subject: Re: [Haskell-beginners] Stack space overflow with foldl'
To: Ryan Prichard <[email protected]>
Cc: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain;  charset="iso-8859-1"

On Saturday 11 September 2010 23:30:50, Ryan Prichard wrote:
> > That's probably a different matter, foldl' evaluates the accumulation
> > parameter only to weak head normal form, if it's not of simple type,
> > it can still contain thunks that will overflow the stack when
> > demanded.
>
> I'm using Data.ByteString.ByteString:
>
> Prelude> :info Data.ByteString.ByteString
> data Data.ByteString.Internal.ByteString
>   = Data.ByteString.Internal.PS !(GHC.ForeignPtr.ForeignPtr
>                                     GHC.Word.Word8)
>                                 !Int
>                                 !Int
>
> If I evaluate a value of this type into WHNF, I think the fields are
> also evaluated into WHNF due to the strictness annotations.  That
> should remove thunks from the Int arguments.  I wonder if the
> ForeignPtr field could still have a thunk.

No, the pointer is just a memory address, no thunk there, a (strict) 
ByteString in WHNF is fully evaluated.
The question is what you do with your ByteStrings in the fold.

>
> I suppose I really wanted an NFData instance for ByteString so I could
> use rnf.

In general, you don't need an NFData instance for ByteStrings, since WHNF = 
NF for (strict) ByteStrings and reducing lazy ByteStrings to normal form 
would sort of defeat their purpose. If you need an NFData instance because 
you want to rnf e.g. a Map ByteString stuff, the default implementation of 
rnf is enough.

>
> I looked at the NFData instance for a list. It's just:
>
> instance NFData a => NFData [a] where
>     rnf [] = ()
>     rnf (x:xs) = rnf x `seq` rnf xs
>
> If I just want to evaluate each list element to WHNF, I think I could
> write:
>
> forceList [] = ()
> forceList (x:xs) = x `seq` forceList xs

Yes.

>
> This function is simpler than foldl'2, and it turns into a
> tail-recursive function in Core.  Maybe I could also use
> "forceList = foldr seq ()".
>

Exactly the same. But of course you need to evaluate forceList list to WHNF 
to have it do any work,

case forceList list of
  () -> foo

let !() = forceList list in bar

or something.

> > > I might have fixed my original stack overflow problem.  I was
> > > applying sum to a large list of integers, and sum is lazy.
> >
> > For [Int] and [Integer], sum is specialised, so when compiled with
> > optimisations, ghc should use a strict version for those types.
>
> I retested my original program that used sum.  It overflows the stack
> with -O0, but it doesn't overflow with -O or "-O -fno-strictness". With
> -v4 -ddump-simpl-stats -O, I see
>
> 1 RuleFired
>     1 SPEC Data.List.sum
>
> I don't see this output with -O0 instead of -O.

Specialisations are only used when compiled with optimisations since the 
specialised versions have different names and are inserted via rewrite 
rules. And rewrite rules are ignored with -O0.

So without optimisations, you get the vanilla sum with lazy accumulator.
To avoid stack overflows without optimisations, use foldl' (+) 0 instead of 
sum.

>
> -Ryan



------------------------------

Message: 4
Date: Sat, 11 Sep 2010 22:21:02 -0400
From: Henry Olders <[email protected]>
Subject: Re: [Haskell-beginners] Arrays in Haskell
To: Tobias Brandt <[email protected]>
Cc: Lorenzo Isella <[email protected]>,  "[email protected]"
        <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset=us-ascii

Thanks for setting me straight, Tobias. Still thinking in python, I guess...

Henry




On 2010-09-11, at 12:42 , Tobias Brandt wrote:

> On 11 September 2010 18:28, Henry Olders <[email protected]> wrote:
>> The second issue is that in Haskell, lists are mutable, whereas arrays are 
>> (mostly) immutable.
> 
> When did that happen? Of course the standard lists are immutable.



------------------------------

Message: 5
Date: Sun, 12 Sep 2010 05:34:23 +0200
From: Tobias Brandt <[email protected]>
Subject: Re: [Haskell-beginners] Arrays in Haskell
To: Lorenzo Isella <[email protected]>
Cc: [email protected]
Message-ID:
        <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1

Another alternative is the vector package:

http://hackage.haskell.org/package/vector

It supports slicing and has mutable unboxed arrays (which is pretty
much optimal for speed). The downside is, that they are
one-dimensional. But you can of course make vectors of vectors of ...

There is also a number of vector-* packages with additional stuff like
algorithms and instances for vectors.


------------------------------

Message: 6
Date: Sun, 12 Sep 2010 13:57:50 +0200
From: Lorenzo Isella <[email protected]>
Subject: [Haskell-beginners] Again on List Manipulation
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Dear All,
First of all, thanks to the whole list for their help.
I am still struggling with things I used to be able to do quite easily 
(though maybe not efficiently) with other languages. I still have to get 
used to the immutable nature of lists and the absence of for loops.
Let us say you have two lists

l = [1,1,1,1,2,2,2,2,2,2,2,3,3,5,6,7] (already sorted)

and a list of corresponding values for every entry in l

m=  [2,2,2,2,4,4,4,4,6,6,4,4,4,10,12,14].
Also consider the list of the unique values in l

l_un = nub l

Now, what I would like to do is to get a third list, let us call it q, 
such that its length is equal to the length of l_un.
On top of that, its i-th entry should be the sum of the entries of m 
corresponding to the i-th values of l_un.
To fix the ideas, q should be

q = [8,32, 8,10,12,14] .
How would you do that?

Cheers

Lorenzo

P.S.: I will need this function both with Integer and Double numbers (I 
may have situation in which the entries of one list or both are real 
numbers, though the goal stays the same)


------------------------------

_______________________________________________
Beginners mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/beginners


End of Beginners Digest, Vol 27, Issue 27
*****************************************

Beginners Digest, Vol 27, Issue 27

Reply via email to