Hi,

I've been writing some code to calculate the stretch factor of a tree of points. What it means is that for every node in a tree (lets call it 'pivot'), I have to traverse the same tree (lets call each node 'current') and sum d_t(pivot, current) / d(pivot, current) for each node, where d_t is the tree path distance between two nodes and d is the Euclidean distance.

What I've been doing is traversing the tree, pivoting up each node into the root position, and calculating the stretch factor for each new tree.

ex.          +===+                 +===+                  +===+
             | 1 |                 | 2 |                  | 3 |
             +===+             .---+===+----.             +===+
            /                 /    |   |     \                 \
       +---+             +---+  +---+  +---+  +---+             +---+
| 2 | ---> | 3 | | 4 | | 5 | | 1 | ---> | 2 | etc.
      .+---+.            +---+  +---+  +---+  +---+            .+---+.
     /   |   \                                                /   |   \
+---+ +---+ +---+ +---+ +---+ +---+ | 3 | | 4 | | 5 | | 4 | | 5 | | 1 | +---+ +---+ +---+ +---+ +---+ +---+


The code I have right now works great on point sets of size ~10,000, but when I go up much higher things start to go wrong. I start hitting stack overflows, so I increased the stack size. For ~100,000 points though, running the code with +RTS -sstderr -K128m it chugged along for over an hour and then died with a stack overflow. The stats said it spent 50 seconds MUT time and 5300 seconds (almost 90 minutes!) GC time, which seems odd.

The performance has been really good for lower numbers of points, but the professor I'm working for wants to handle over a million points later on. I've only been writing Haskell for a year, and I'm not quite sure how to rewrite this so that it won't blow the stack, since I'm pretty sure this kind of tree traversal can't be done with tail calls (I would love to proved wrong, though!). Any help would be appreciated.

Thanks,
Matt

module ASF (
    averageStretchFactor
) where

import Data.Tree
import Data.Foldable (foldr')

type Point = (Double,Double)

square :: Double -> Double
square x = x * x

dist :: Point -> Point -> Double
dist (x1,y1) (x2,y2) = sqrt (square (x2 - x1) + square (y2 - y1))

add :: Tree a -> Tree a -> Tree a
add (Node p sts) t = Node p (t:sts)

-- Calculate the average stretch factor of a tree of size n. It's the sum of -- the average stretch factor of each node in the tree, divided by n choose 2
-- (the number of possible pairs of points in in a tree of size n)
averageStretchFactor :: Tree Point -> Double -> Double
averageStretchFactor tree n = stretchRotations tree / (n * (n - 1) / 2)

-- Calculate the stretch factor of a tree.
-- The stretch of two points is the tree path distance between the two points -- divided by the euclidean distance between the two points. The stretch factor -- of a tree is the sum of the stretches between the root and every other node
-- in the tree.
stretchFactor :: Tree Point -> Double
stretchFactor (Node point sts) = stretch 0 point sts
  where
    stretch _ _ [] = 0
stretch d p ((Node p' sts'):ts) = pd / ed + stretch pd p' sts' + stretch d p ts
      where
        pd = d + dist p p' -- path distance
        ed = dist point p' -- euclidean distance

-- Calculate the stretch factor of every point by pulling up each node in the -- tree to the root position. Note that the overall structure of the tree -- doesn't change, we're essentially just traversing the tree and calculating
-- the stretch factor of each node by pretending we're at the root.
stretchRotations :: Tree Point -> Double
stretchRotations tree = rotate tree []
  where
rotate tree@(Node p sts) path = stretchFactor (foldr' add tree path) + pivot [] sts path p
    pivot _  []     _    _ = 0
pivot ls (r:rs) path p = rotate r (Node p (ls ++ rs) : path) + pivot (r:ls) rs path p


./main 10000 +RTS -sstderr -K128m
ASF = 22.441
   1,298,891,896 bytes allocated in the heap
      20,191,904 bytes copied during GC
       3,107,116 bytes maximum residency (7 sample(s))
          47,480 bytes maximum slop
8 MB total memory in use (0 MB lost due to fragmentation)

Generation 0: 2471 collections, 0 parallel, 0.06s, 0.07s elapsed Generation 1: 7 collections, 0 parallel, 0.03s, 0.03s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   13.05s  ( 13.18s elapsed)
  GC    time    0.09s  (  0.11s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   13.14s  ( 13.29s elapsed)

  %GC time       0.7%  (0.8% elapsed)

  Alloc rate    99,506,082 bytes per MUT second

  Productivity  99.3% of total user, 98.2% of total elapsed


./main 100000 +RTS -sstderr -K128m
Stack space overflow: current size 128000000 bytes.
Use `+RTS -Ksize' to increase it.
  33,097,322,168 bytes allocated in the heap
   3,004,248,280 bytes copied during GC
     704,391,196 bytes maximum residency (29 sample(s))
      67,420,196 bytes maximum slop
1731 MB total memory in use (14 MB lost due to fragmentation)

Generation 0: 62561 collections, 0 parallel, 5364.16s, 5619.83s elapsed Generation 1: 29 collections, 0 parallel, 4.50s, 48.73s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   49.85s  ( 78.94s elapsed)
  GC    time  5368.66s  (5668.56s elapsed)
  EXIT  time    0.00s  (  7.88s elapsed)
  Total time  5418.51s  (5747.56s elapsed)

  %GC time      99.1%  (98.6% elapsed)

  Alloc rate    663,930,946 bytes per MUT second

  Productivity   0.9% of total user, 0.9% of total elapsed


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to