Beginners Digest, Vol 22, Issue 19

beginners-request Tue, 13 Apr 2010 05:36:32 -0700

Send Beginners mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."

Today's Topics:

   1.  Re: Space leak in a fold even after the usual    fixes
      (Travis Erdman)
   2. Re:  Re: Space leak in a fold even after the usual        fixes
      (Daniel Fischer)
   3. Re:  Space leak in a fold even after the usual    fixes
      (Stephen Blackheath [to Haskell-Beginners])
   4. Re:  Space leak in a fold even after the usual    fixes
      (Brent Yorgey)
   5. Re:  Compiling C into Haskell (Brent Yorgey)
   6.  Re: Space leak in a fold even after the usual    fixes
      (Heinrich Apfelmus)
   7.  Re: Compiling C into Haskell (Heinrich Apfelmus)
   8. Re:  Compiling C into Haskell (Hein Hundal)

----------------------------------------------------------------------

Message: 1
Date: Mon, 12 Apr 2010 10:11:21 -0700 (PDT)
From: Travis Erdman <[email protected]>
Subject: [Haskell-beginners] Re: Space leak in a fold even after the
        usual   fixes
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset="us-ascii"

> I take it that size of the array does not depend on  arg ?

> Apparently, despite your attempts to force evaluation, the array still
> contains many unevaluated expressions. Documentation for the  vector
> pacakge reveals that  Data.Vector  is a *boxed* array, i.e. the entries
> are only evaluated on demand, so that's where the unevaluated
> expressions linger around.

> Unless you need the extra laziness - which you probably don't, given how
> you've structured your algorithm - you might want to switch to
> Data.Vector.Unboxed  or  Data.Vector.Storable.

Correct.  The array size doesn't depend on arg (arg only controls how much
of the input list to process.

I did not realize that the boxed arrays would have this extra-lazy behavior.
The reason I chose boxed is because I needed to store these very large trees
(custom data type) that I referenced in an earlier question.  Since D.V.Unboxed
and Storable don't appear to support non-standard data types, is there another
package that might better fit my needs?  Data.IntMap perhaps?  I originally
chose an array over a map because I need fast random access to individual
elements (and cheap modification).

Thanks again,

Travis Erdman

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://www.haskell.org/pipermail/beginners/attachments/20100412/a3d00b87/attachment-0001.html

------------------------------

Message: 2
Date: Mon, 12 Apr 2010 20:09:38 +0200
From: Daniel Fischer <[email protected]>
Subject: Re: [Haskell-beginners] Re: Space leak in a fold even after
        the usual       fixes
To: [email protected]
Cc: Travis Erdman <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain;  charset="utf-8"

Am Montag 12 April 2010 19:11:21 schrieb Travis Erdman:
> > I take it that size of the array does not depend on  arg ?
> >
> > Apparently, despite your attempts to force evaluation, the array still
> > contains many unevaluated expressions. Documentation for the  vector
> > pacakge reveals that  Data.Vector  is a *boxed* array, i.e. the
> > entries are only evaluated on demand, so that's where the unevaluated
> > expressions linger around.
> >
> > Unless you need the extra laziness - which you probably don't, given
> > how you've structured your algorithm - you might want to switch to
> > Data.Vector.Unboxed  or  Data.Vector.Storable.
>
> Correct.  The array size doesn't depend on arg (arg only controls how
> much of the input list to process.
>
> I did not realize that the boxed arrays would have this extra-lazy
> behavior. The reason I chose boxed is because I needed to store these
> very large trees (custom data type) that I referenced in an earlier
> question.  Since D.V.Unboxed and Storable don't appear to support
> non-standard data types, is there another package that might better fit
> my needs?

You could try to introduce the necessary strictness via Control.DeepSeq 
and/or Control.Parallel.Strategies, strictly writing (newTree `using` rnf) 
to the array.

> Data.IntMap perhaps?  I originally chose an array over a map
> because I need fast random access to individual elements (and cheap
> modification).
>
> Thanks again,
>
> Travis Erdman

------------------------------

Message: 3
Date: Tue, 13 Apr 2010 08:40:43 +1200
From: "Stephen Blackheath [to Haskell-Beginners]"
        <[email protected]>
Subject: Re: [Haskell-beginners] Space leak in a fold even after the
        usual   fixes
To: [email protected]
Cc: Travis Erdman <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset=UTF-8

Travis,

IMHO most Haskell features give a great benefit with a very small cost,
but laziness is the exception:  It gives a great benefit, but at a
fairly significant cost.  That cost comes in the form of a learning
curve for dealing with space behaviour.

I would add that even though Haskell has some faults, it is overall an
amazingly practical language - possibly the best around at this point in
time.  Out of all the gripes that people rightly make about certain
details of Haskell, the only one that causes any real grief is space
behaviour.  If we can ease the pain of space behaviour, then Haskell can
be all pleasure.  I hope I can help with that!

The first thing to understand is that "a `seq` b" means "when you
evaluate me, evaluate a and b, and return b".  It doesn't actually force
evaluation, it just ties the valuation of one thing to the evaluation of
another.  It's equivalent to just "b" except that it also implies the
evaluation of a.

The next thing to understand is that "a `seq` b" only causes evaluation
of the outermost constructor of a.  For some "strict" types the
outermost constructor is the whole value, so `seq` causes the whole
expression to be evaluated, e.g.

  Int
  Bool
  newtype Metres = Metres Float
  data Point2 = Point2 !Int !Int

Even if it's a simple type, any unevaluated expression can potentially
consume large amounts of memory, because the runtime system can't
garbage collect whatever data structures are needed to evaluate it.
Then you can get space leaks.  With these strict types, life is easy -
you can just use `seq` or foldl' or such like, and then you can
guarantee that these references are no longer hanging around.

But a lot of types are lazy.  Here are some examples.  Their contained
values aren't evaluated by a simple `seq`:

  Maybe a
  data Metres = Metres Float
  (a,b)
  [a]
  Data.Set.Set a

Another point is that the "a" in "a `seq` b" needs to be a let binding
so it's the same "a" as you use in other places, e.g.

  let total' = total + value
      n' = n + 1
  in total' `seq` n' `seq` (total, n)

Because of the two `seq`s, the value of this expression is now strict,
so that using `seq` on it will force the whole thing, even though it is
of type (a, b) which is normally a lazy type.

So this means it's possible to make a normally lazy value - or specific
parts of it - strict by making sure that each publicly exposed function
that constructs the value forces evaluation using appropriate `seq`s.
For example, the keys in a Data.Map structure are treated strictly, and
the values lazily.  This strictness is guaranteed by each function that
manipulates Data.Map being written so as to give this guarantee, and
then a promise is made in the API documentation.  It's important to
document space behaviour, since the caller does need to know it.

So that's how to control evaluation directly.  The
Control.Parallel.Strategies module from 'parallel' contains helpers for
managing evaluation (useful in non-parallel software too).  I think
ultimately if you want to put Haskell to practice use, you do need to be
aware of Haskell's evaluation behaviour.  Until you really understand
it, you will be occasionally plagued by space leaks.  In my experience,
space leaks have been tractable with a combination of space profiling
(which is unavoidable sometimes for big programs) and good
understanding, which has taken a bit of effort to acquire.

Steve

Travis Erdman wrote:
> The driver of my algorithm looks like this:
> 
> foldl' processfn nullarray (take arg infinitelist)
> 
> where processfn takes an array and the next set of inputs,
> processes, and delivers a new updated array (using the
> Data.Vector library).
>  
> Apparently, I have a space leak ... the "maximum residency"
> varies linearly with the size of "arg" supplied, garbage
> collection consumes ~75% of CPU time, and, if the arg is too
> big, the whole thing crashes with an out of memory
> error.  The algorithm should operate in constant
> space.
> 
> As you can see, I'm using a strict left-fold and also have
> made the accumulating array strict in the processfn
> definition with a bang pattern.  So, I'm sort of at a
> loss as to how to resolve this.
> 
> The help provided on this list has been outstanding, thanks
> to all of you; hope you have something left in the tank for
> this one!
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Beginners mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/beginners

------------------------------

Message: 4
Date: Mon, 12 Apr 2010 17:30:34 -0400
From: Brent Yorgey <[email protected]>
Subject: Re: [Haskell-beginners] Space leak in a fold even after the
        usual   fixes
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=us-ascii

On Tue, Apr 13, 2010 at 08:40:43AM +1200, Stephen Blackheath [to 
Haskell-Beginners] wrote:
> 
> Another point is that the "a" in "a `seq` b" needs to be a let binding
> so it's the same "a" as you use in other places, e.g.
> 
>   let total' = total + value
>       n' = n + 1
>   in total' `seq` n' `seq` (total, n)

I assume you meant

>   ...
>   in total' `seq` n' `seq` (total', n')

?

-Brent

------------------------------

Message: 5
Date: Mon, 12 Apr 2010 18:13:32 -0400
From: Brent Yorgey <[email protected]>
Subject: Re: [Haskell-beginners] Compiling C into Haskell
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=us-ascii

On Mon, Apr 12, 2010 at 08:29:51AM -0700, Hein Hundal wrote:
> 
> PS: I feel that I need to comment on your use of the phrase "dumb
> question".  Some beginners are young and they might be put off by
> the use of language that could be interpreted as a put down.
> Perhaps in the future, if you want to use that phrase in a beginners
> forum you could put the word dumb in quotes.  That way, the reader
> is less likely to interpret it as a put down.  :)

I agree with Hein; please don't use language like that when replying
to questions on this list, even if it is only in jest, and even if the
question IS dumb (which Hein's very clearly was not!).

-Brent

------------------------------

Message: 6
Date: Tue, 13 Apr 2010 09:26:10 +0200
From: Heinrich Apfelmus <[email protected]>
Subject: [Haskell-beginners] Re: Space leak in a fold even after the
        usual   fixes
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=UTF-8

Travis Erdman wrote:
> 
> I did not realize that the boxed arrays would have this extra-lazy behavior.
> The reason I chose boxed is because I needed to store these very large trees
> (custom data type) that I referenced in an earlier question.  Since 
> D.V.Unboxed
> and Storable don't appear to support non-standard data types, is there another
> package that might better fit my needs?  Data.IntMap perhaps?  I originally
> chose an array over a map because I need fast random access to individual
> elements (and cheap modification).

Concerning your choice of data structure, take note that data structures
are *persistent* in Haskell, i.e. they cannot be mutated. So, while
arrays provide random access in O(1) time, they need O(n) time to update
because the whole array has to be copied (unless you ensure ephemeral
use with the ST or IO monad). Nonetheless, the Data.Vector library
provides functions like  map  that operate on the whole array and run in
O(n) time.

In comparison, Data.IntMap has O(size of Int in bits) lookup and O(size
of Int in bits) update.

Data.Vector is mainly about arrays of bytes, hence the prominence of
Unboxed and Storable. You could make your tree an instance of  Storable
, but I think that's only possible for fixed size types anyway.

Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com

------------------------------

Message: 7
Date: Tue, 13 Apr 2010 09:35:39 +0200
From: Heinrich Apfelmus <[email protected]>
Subject: [Haskell-beginners] Re: Compiling C into Haskell
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=UTF-8

Hein Hundal wrote:
> 
> I currently don't know enough about Haskell to answer the question
> "Will Haskell be fast, quick to code, and very useful for this
> project?"

Haskell is extremely quick to code and can be made fast, but you have to
spend time learning it. It's not a language that you can just pick up
like that when coming from an imperative background, especially
concerning the "can be made fast" part.

> For my machine learning project, I will definitely have to do a lot
> of complex operations on matrices.  I want the abstraction power of
> Lisp, but all the compilers for Lisp seem to produce rather slow
> executables.  I was hoping that Haskell might work for this project
> or Haskell plus the LAPACK numerical computation library.

What are the complex operations you want to do on matrices? There is the
 hmatrix  that binds to LAPACK and is a pleasure to use, but it's mainly
about matrix multiplication and other numerical matrix operations.

Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com

------------------------------

Message: 8
Date: Tue, 13 Apr 2010 05:36:23 -0700 (PDT)
From: Hein Hundal <[email protected]>
Subject: Re: [Haskell-beginners] Compiling C into Haskell
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=us-ascii

Heinrich wrote:
>What are the complex operations you want to do on matrices? There 
>is the hmatrix that binds to LAPACK and is a pleasure to use, 
>but it's mainly about matrix multiplication and other numerical 
>matrix operations.

The main operations I need are Matrix Inversion, Transpose, Singular Value 
Decomposition, and QR Factoring.  LAPACK takes care of all of these.  I will 
have to look into hmatrix.  Thanks!

Cheers,
Hein

------------------------------

_______________________________________________
Beginners mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/beginners

End of Beginners Digest, Vol 22, Issue 19
*****************************************

Beginners Digest, Vol 22, Issue 19

Reply via email to