Ah, cool. the long getSize() method will return Crunch's estimate of the size of the object in bytes, but it's good to keep in mind that it's a very rough approximation based on the size of the file on disk and any info we have about the behavior of any DoFns that are applied to the PTable when it is processed, which is communicated via the scaleFactor() function on each DoFn.
On Mon, Feb 24, 2014 at 10:57 AM, Jinal Shah <[email protected]>wrote: > By size I meant the memory size sorry for the confusion. Like how much > memory will a PTable object require. Basically what I'm trying to do is if > the object is not that large and if it could fit in memory I wanted to > apply map-side join to optimize the join and depending on that I also > wanted to determine which one is smaller to use the Left join. > > > On Mon, Feb 24, 2014 at 12:45 PM, Josh Wills <[email protected]> wrote: > > > There is the length() method, which will return a PObject<Long> with the > > number of elements in the PCollection. It requires running an MR job > > though. > > > > J > > > > > > On Mon, Feb 24, 2014 at 10:03 AM, Jinal Shah <[email protected] > > >wrote: > > > > > Hi, > > > > > > Is there a way possible in crunch to find the size of a particular > > > PCollection or PTable in whole. > > > > > > Thanks > > > Jinal > > > > > > > > > > > -- > > Director of Data Science > > Cloudera <http://www.cloudera.com> > > Twitter: @josh_wills <http://twitter.com/josh_wills> > > > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
