On 17/06/2010 06:23, braver wrote:
WIth @dafis's help, there's a version tagged cafe3 on the master
branch which is better performing with ByteString.  I also went ahead
and interned ByteString as Int, converting the structure to IntMap
everywhere.  That's reflected on the new "intern" branch at tag cafe4.

Still it can't do the full 35 days for all users.  It comes close,
however, to 30 days under ghc 6.12 with the IntMap -- just where 6.10
was with Map ByteString.  Some profiling is in prof/ subdirectory,
with the tag responsible and RTS profiling option in the file
name; .prof are -P, and the rest are -hX.

When I downsize the sample data to 1 million users, the whole run,
with -P profiling, is done in 7.5 minutes.  Something happens when
tripling that amount.  For instance, making -A10G may cause sefgault,
after a fast run up to 10 days, then seeming stalling, and a dump of
days up to 28 before the segfault.  -A5G comes closest, to 30 days,
when coupled with -H1G.  It's not clear to me how to work -A and -H
together.

I'll work with Simon to investigate the runtime, but would welcome any
ideas on further speeding up cafe4.

An update on this: with the help of Alex I tracked down the problem (an integer overflow bug in GHC's memory allocator), and his program now runs to completion.

This is the largest program (in terms of memory requirements) I've ever seen anyone run using GHC. In fact there was no machine in our building capable of running it, I had to fire up the largest Amazon EC2 instance available (68GB) to debug it - this bug cost me $26. Here are the stats from the working program:

 392,908,177,040 bytes allocated in the heap
 174,455,211,920 bytes copied during GC
  24,151,940,568 bytes maximum residency (6 sample(s))
  36,857,590,520 bytes maximum slop
           64029 MB total memory in use (1000 MB lost due to fragmentation)

  Generation 0:    62 collections,     0 parallel, 352.35s, 357.13s elapsed
  Generation 1:     6 collections,     0 parallel, 180.63s, 209.19s elapsed

  INIT  time    0.00s  (  0.11s elapsed)
  MUT   time  1201.47s  (1294.29s elapsed)
  GC    time  532.98s  (566.33s elapsed)
  EXIT  time    0.00s  (  5.34s elapsed)
  Total time  1734.46s  (1860.74s elapsed)

  %GC time      30.7%  (30.4% elapsed)

  Alloc rate    327,020,156 bytes per MUT second

  Productivity  69.3% of total user, 64.6% of total elapsed


The slop calculation is off a bit, because slop for pinned objects (ByteStrings) isn't being calculated properly, I should really fix that.

Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to