Agree with feng yu, you need think about whether you need build such high-cardinality dimension into Cube;
For example, if the column is something like a free text description, or a timestamp column, it doesn't make sense to have them in Cube, as Kylin is an OLDAP engine not a common database; you'd better redesign the cube. If it is something like a "seller_id" (assuming you have a large number of sellers, like eBay), and you need aggregte the data by each seller_id, this is a valid case for UHC. Just think about and then decide how to move on. 2016-01-09 9:52 GMT+08:00 yu feng <[email protected]>: > assume average size of this column is 32 bytes, 50 millions cardinality > means 1.5GB, in the step of 'Fact Table Distinct Columns.' mapper need read > from intermediate table and remove duplicate values(do it in Combiner), > however, this job will startup more than one mapper and just one reducer, > therefore, input for reducer is more than 1.5GB and in reduce function > kylin will create a new Set to contain all unique values, so , this is a > another 1.5GB. > > I have encounter this probelm and I have to change MR config preperty for > every job, I modify those properties : > <property> > <name>mapreduce.reduce.java.opts</name> > <value>-Xmx6000M</value> > <description>Larger heap-size for child jvms of > reduces.</description> > </property> > > <property> > <name>mapreduce.reduce.memory.mb</name> > <value>8000</value> > <description>Larger resource limit for reduces.</description> > </property> > you can check the value of those properties currently used and increase > them. > > At Last, ask yourself Do you really need all detail values of those two > column, if not , you can create create view to change the source data or > just do not use dictionary while creating cube, set the length value for > them in 'Advanced Setting' step.. > > Hope to be helpful to you. > > 2016-01-09 6:17 GMT+08:00 zhong zhang <[email protected]>: > > > Hi All, > > > > There are two ultra high carnality columns in our cube. Both of them are > > over 50 million cardinality. When building the cube, it keeps giving us > the > > error: Error: GC overhead limit exceeded for the reduce jobs at the > > step Extract > > Fact Table Distinct Columns. > > > > We've just updated to version1.2. > > > > Can anyone give some ideas to solve this issue? > > > > Best regards, > > Zhong > > > -- Best regards, Shaofeng Shi
