Amir Youssefi
Thu, 26 Jun 2008 12:53:54 -0700
Checking the code just committed I see that defaults are there:
- private static long gcActivationSize = Long.MAX_VALUE ;
- private static long spillFileSizeThreshold = 0L ;
+ // if we freed at least this much, invoke GC
+ // (default 40 MB - this can be overridden by user supplied
property)
+ private static long gcActivationSize = 40000000L ;
+ // spill file size should be at least this much
+ // (default 5MB - this can be overridden by user supplied property)
+ private static long spillFileSizeThreshold = 5000000L ;
+
+ // this will keep track of memory freed across spills
+ // and between GC invocations
+ private static long accumulatedFreeSize = 0L;
+
+ // fraction of biggest heap for which we want to get
+ // "memory usage threshold exceeded" notifications
+ private static double memoryThresholdFraction = 0.7;
+
+ // fraction of biggest heap for which we want to get
+ // "collection threshold exceeded" notifications
+ private static double collectionMemoryThresholdFraction = 0.5;
So I am running it again to see how it goes this time.
Amir
-----Original Message-----
From: Amir Youssefi [EMAIL PROTECTED]
Sent: Thursday, June 26, 2008 12:30 PM
To: pig-user@incubator.apache.org
Subject: RE: Slow tutorial?
Hi Mark,
pig.jar that comes with it is old and doesn't have pig.properties.
Try making a new build (June 26th or later) and make sure you have
these in pig.properties:
#Do not spill temp files smaller than this size (bytes)
pig.spill.size.threshold=5000000
#EXPERIMENT: Activate garbage collection when spilling a file bigger
than this size (bytes) #This should help reduce the number of files
being spilled.
pig.spill.gc.activation.size=40000000
or similar numbers...
Amir
-----Original Message-----
From: Mark Snow [EMAIL PROTECTED]
Sent: Wednesday, June 25, 2008 8:07 PM
To: pig-user@incubator.apache.org
Subject: Slow tutorial?
Hi All,
I downloaded the pig tutorial to give it a whirl, set it up on a hadoop
cluster I've used for a few other tasks (7 nodes, ec2) and went through
the instructions to launch tutorial script1 with the excite bz file on
hdfs. Two things jumped out:
1) Only one mapper launched
2) It's really slow. It's been almost 5 hours and still under 10% of the
mapper is completed
Have I misconfigured something? What's a good benchmark run time for the
tutorial scripts to complete?