Beginners Digest, Vol 22, Issue 8

beginners-request Wed, 07 Apr 2010 09:32:08 -0700

Send Beginners mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        [email protected]


You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."


Today's Topics:

   1.  Functional Data Structure with Cheapest  Modification?
      (Travis Erdman)
   2. Re:  Functional Data Structure with Cheapest      Modification?
      (Edward Z. Yang)
   3. Re:  Functional Data Structure with Cheapest      Modification?
      (Daniel Fischer)
   4. Re:  Functional Data Structure with Cheapest      Modification?
      (Daniel Fischer)
   5.  Re: Functional Data Structure with Cheapest      Modification?
      (Heinrich Apfelmus)


----------------------------------------------------------------------

Message: 1
Date: Tue, 6 Apr 2010 12:31:30 -0700 (PDT)
From: Travis Erdman <[email protected]>
Subject: [Haskell-beginners] Functional Data Structure with Cheapest
        Modification?
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=iso-8859-1

In Real World Haskell, the authors state that "the attraction of a tree to a 
functional programmer is cheap modification."  They go on to explain that in a 
tree of say 10,000 nodes, approx 9,985 would be reused (on average) when 
updating a single node (rather than having to copy all 10k nodes, like a 
Haskell list would have to do).

In my application, I have a collection of (only) about 200 objects.  So from 
the surface, not large at all.  However, each object in the collection is a 
large tree that might be up to 20 MB in size.  The application pulls objects 
one at a time from the collection, performs some processing, and delivers a 
new, updated object that needs to be inserted in the collection for future 
processing (the original object is now obsolete).  I'm guessing a Data.IntMap 
might involve copying over about 8 objects for each single object update, which 
while better than copying all 200, is still not perfect.

What would be the most efficient Haskelly way to maintain my collection?

I'm wondering also if there is some sort of "pointer list" I could use.  I 
think recreating a list of 200 pointers (199 of which are unchanged) might be a 
pretty efficient solution, if it exists.

Sadly, I've tried to understand Mutable Arrays (of a few different flavors), 
but all the monadic stuff is over my head at the moment, and I couldn't find 
sufficient documentation/tutorial that I could understand to implement them.


      



------------------------------

Message: 2
Date: Tue, 06 Apr 2010 15:49:10 -0400
From: "Edward Z. Yang" <[email protected]>
Subject: Re: [Haskell-beginners] Functional Data Structure with
        Cheapest        Modification?
To: Travis Erdman <[email protected]>
Cc: beginners <[email protected]>
Message-ID: <1270583263-sup-8...@ezyang>
Content-Type: text/plain; charset=UTF-8

Excerpts from Travis Erdman's message of Tue Apr 06 15:31:30 -0400 2010:
> In my application, I have a collection of (only) about 200 objects.  So from
> the surface, not large at all.  However, each object in the collection is a
> large tree that might be up to 20 MB in size.  The application pulls objects
> one at a time from the collection, performs some processing, and delivers a
> new, updated object that needs to be inserted in the collection for future
> processing (the original object is now obsolete).  I'm guessing a Data.IntMap
> might involve copying over about 8 objects for each single object update,
> which while better than copying all 200, is still not perfect.

The construction of Haskell tree types is such that the tree is automatically
storing pointers to the actual object: it's only if you ask that the value
be unboxed explicitly that it gets stored with the node.  So you're only paying
for a few pointer copies, not a huge 20MB copy.

Have you profiled your application and found this to be the bottleneck?

Cheers,
Edward


------------------------------

Message: 3
Date: Tue, 6 Apr 2010 21:51:47 +0200
From: Daniel Fischer <[email protected]>
Subject: Re: [Haskell-beginners] Functional Data Structure with
        Cheapest        Modification?
To: [email protected]
Cc: Travis Erdman <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain;  charset="iso-8859-1"

Am Dienstag 06 April 2010 21:31:30 schrieb Travis Erdman:
> In Real World Haskell, the authors state that "the attraction of a tree
> to a functional programmer is cheap modification."  They go on to
> explain that in a tree of say 10,000 nodes, approx 9,985 would be reused
> (on average) when updating a single node (rather than having to copy all
> 10k nodes, like a Haskell list would have to do).
>
> In my application, I have a collection of (only) about 200 objects.  So
> from the surface, not large at all.  However, each object in the
> collection is a large tree that might be up to 20 MB in size.  The
> application pulls objects one at a time from the collection, performs
> some processing, and delivers a new, updated object that needs to be
> inserted in the collection for future processing (the original object is
> now obsolete).  I'm guessing a Data.IntMap might involve copying over
> about 8 objects for each single object update, which while better than
> copying all 200, is still not perfect.
>
> What would be the most efficient Haskelly way to maintain my collection?
>
> I'm wondering also if there is some sort of "pointer list" I could use. 
> I think recreating a list of 200 pointers (199 of which are unchanged)
> might be a pretty efficient solution, if it exists.
>
> Sadly, I've tried to understand Mutable Arrays (of a few different
> flavors), but all the monadic stuff is over my head at the moment, and I
> couldn't find sufficient documentation/tutorial that I could understand
> to implement them.
>

An immutable array should be fine for this. When you modify one tree, you 
create a new array of 200 pointers to trees, 199 of them are just copied 
from the old array, one points to your new modified tree.

Of course, with STArrays, there would be no need to copy any pointers and 
allocate new arrays, so in principle it should be even more efficient, but 
boxed mutable arrays and the garbage collector don't play too well together
(http://hackage.haskell.org/trac/ghc/ticket/650), so it might not be 
better.


------------------------------

Message: 4
Date: Tue, 6 Apr 2010 22:08:58 +0200
From: Daniel Fischer <[email protected]>
Subject: Re: [Haskell-beginners] Functional Data Structure with
        Cheapest        Modification?
To: [email protected]
Cc: Travis Erdman <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain;  charset="iso-8859-1"

Am Dienstag 06 April 2010 21:51:47 schrieb Daniel Fischer:
>
> An immutable array should be fine for this. When you modify one tree,
> you create a new array of 200 pointers to trees, 199 of them are just
> copied from the old array, one points to your new modified tree.
>
> Of course, with STArrays, there would be no need to copy any pointers
> and allocate new arrays, so in principle it should be even more
> efficient, but boxed mutable arrays and the garbage collector don't play
> too well together (http://hackage.haskell.org/trac/ghc/ticket/650), so
> it might not be better.

Forgot to check before posting: it's fixed now, the fix will be in 6.12.2.

And what Edward said, only pointers will be copied anyway, so a Map can 
well be better than an array, depends on your access patterns.


------------------------------

Message: 5
Date: Wed, 07 Apr 2010 09:27:30 +0200
From: Heinrich Apfelmus <[email protected]>
Subject: [Haskell-beginners] Re: Functional Data Structure with
        Cheapest        Modification?
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=UTF-8

Am 06.04.10 21:31, Travis Erdman wrote:
> In Real World Haskell, the authors state that "the attraction of a
> tree to a functional programmer is cheap modification."  They go on
> to explain that in a tree of say 10,000 nodes, approx 9,985 would be
> reused (on average) when updating a single node (rather than having
> to copy all 10k nodes, like a Haskell list would have to do).
> 
> In my application, I have a collection of (only) about 200 objects.
> So from the surface, not large at all.  However, each object in the
> collection is a large tree that might be up to 20 MB in size.  The
> application pulls objects one at a time from the collection, performs
> some processing, and delivers a new, updated object that needs to be
> inserted in the collection for future processing (the original object
> is now obsolete).  I'm guessing a Data.IntMap might involve copying
> over about 8 objects for each single object update, which while
> better than copying all 200, is still not perfect.

The same reasoning that applies to trees also applies to the objects.
The only thing that will be copied are about log(200) *pointers* to your
objects; the 20MB objects themselves will never be copied.


Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com



------------------------------

_______________________________________________
Beginners mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/beginners


End of Beginners Digest, Vol 22, Issue 8
****************************************

Beginners Digest, Vol 22, Issue 8

Reply via email to