Could you provide the screenshot of that job (expanded) ? So we can see
the spent time on each step; Besides, how many mapper be started in the
map-reduce jobs? Please use this to find whether there is some obvious
bottlenecks;

Except increase the hadoop cluster capacity and enable compress, we also
have other ways to improve; From the data you provided, the cube¹s
expansion rate is about 2000%, this is too high; usually this indicates
the cardinality of some dimension column is too high; your may need check
the cardinality and sampling the data to see whether it makes sense to
having that in the aggregation group.

On 5/17/15, 12:19 AM, "Gaurav Nigam" <[email protected]> wrote:

>Let me share some stats that I collected out this cube building activity:
>
>  1.  Cube Build Time : 427 mins
>  2.  Cube Size: 94.64 GB
>
>Actually my interest is to improve on cube building time. What I would
>appreciate any help on following:
>
>  *   Reduce cube build time (like memory allocation, increase cluster
>etc.)
>  *   Reduce cube size (I know I can enable LZO codec to reduce size on
>Hbase. Anything else would be appreciated)
>
>Thanks.
>
>Best Regards
>
>Gaurav Nigam
>+1 (973) 307-5307
>
>From: gaurav nigam
><[email protected]<mailto:[email protected]>>
>Date: Friday, May 15, 2015 at 2:12 PM
>To: 
>"[email protected]<mailto:[email protected]>"
><[email protected]<mailto:[email protected]>>
>Subject: Hadoop Cluster Capacity Planning
>
>Hi All,
>
>I would need some recommendations on sizing the Hadoop cluster. I'm
>building a cube with on 3 node CDH 5.4.0 cluster. It's taking too much of
>time in building a cube. Here are the details about the cube-
>
>  1.  Data Size - 5GB
>  2.  There are
>     *   3 Hierarchical dimensions
>     *   2 Normal dimensions
>     *   5 measures
>
>I would need some advice on how to decide on Hadoop cluster size based on
>dimensions, measure and data size.
>
>Thanks in advance!
>
>Best Regards
>
>Gaurav Nigam
>+1 (973) 307-5307
>
>________________________________
>
>
>
>
>
>
>NOTE: This message may contain information that is confidential,
>proprietary, privileged or otherwise protected by law. The message is
>intended solely for the named addressee. If received in error, please
>destroy and notify the sender. Any use of this email is prohibited when
>received in error. Impetus does not represent, warrant and/or guarantee,
>that the integrity of this communication has been maintained nor that the
>communication is free of errors, virus, interception or interference.

Reply via email to