Could you provide the screenshot of that job (expanded) ? So we can see the spent time on each step; Besides, how many mapper be started in the map-reduce jobs? Please use this to find whether there is some obvious bottlenecks;
Except increase the hadoop cluster capacity and enable compress, we also have other ways to improve; From the data you provided, the cube¹s expansion rate is about 2000%, this is too high; usually this indicates the cardinality of some dimension column is too high; your may need check the cardinality and sampling the data to see whether it makes sense to having that in the aggregation group. On 5/17/15, 12:19 AM, "Gaurav Nigam" <[email protected]> wrote: >Let me share some stats that I collected out this cube building activity: > > 1. Cube Build Time : 427 mins > 2. Cube Size: 94.64 GB > >Actually my interest is to improve on cube building time. What I would >appreciate any help on following: > > * Reduce cube build time (like memory allocation, increase cluster >etc.) > * Reduce cube size (I know I can enable LZO codec to reduce size on >Hbase. Anything else would be appreciated) > >Thanks. > >Best Regards > >Gaurav Nigam >+1 (973) 307-5307 > >From: gaurav nigam ><[email protected]<mailto:[email protected]>> >Date: Friday, May 15, 2015 at 2:12 PM >To: >"[email protected]<mailto:[email protected]>" ><[email protected]<mailto:[email protected]>> >Subject: Hadoop Cluster Capacity Planning > >Hi All, > >I would need some recommendations on sizing the Hadoop cluster. I'm >building a cube with on 3 node CDH 5.4.0 cluster. It's taking too much of >time in building a cube. Here are the details about the cube- > > 1. Data Size - 5GB > 2. There are > * 3 Hierarchical dimensions > * 2 Normal dimensions > * 5 measures > >I would need some advice on how to decide on Hadoop cluster size based on >dimensions, measure and data size. > >Thanks in advance! > >Best Regards > >Gaurav Nigam >+1 (973) 307-5307 > >________________________________ > > > > > > >NOTE: This message may contain information that is confidential, >proprietary, privileged or otherwise protected by law. The message is >intended solely for the named addressee. If received in error, please >destroy and notify the sender. Any use of this email is prohibited when >received in error. Impetus does not represent, warrant and/or guarantee, >that the integrity of this communication has been maintained nor that the >communication is free of errors, virus, interception or interference.
