Hi Shi, Its alright! So I was wondering my source hive Table is around 3 Gb, despite of my hive table being partitioned and holding the data around 50-70 Mb per partition the Mapper and Reducer getting spawned are single. The amount of data that is being processed in the M/R is nothing as expected but it takes hell lot of time.
As mentioned in the trailing mail that the job is getting very slow, the process Build Base Cuboid Data itself takes around 50mins to get completed. I can tweak the reducer parameter mentioned by you but do u think that will make a difference since the mapper is where the most of the time is spent. Can you share your thoughts for performance tuning for the cube build! Thanks! On Sun, Jun 14, 2015 at 7:26 PM, Shi, Shaofeng <[email protected]> wrote: > Hi, sorry, a busy weekend; > > Usually Kylin will request proper number of mapper and reducers; If you > see single mapper/recudder, how much of your input and output? If your > cube is quite small, single mapper/reducer is possible; > > Number of mappers is decided by the FileInputFormat; But number of reducer > was set by Kylin, see: > https://github.com/apache/incubator-kylin/blob/master/job/src/main/java/org > /apache/kylin/job/hadoop/cube/CuboidJob.java#L141 > > > > > On 6/14/15, 5:25 PM, "Vineet Mishra" <[email protected]> wrote: > > >Urgent call, any follow up on this? > > > >On Fri, Jun 12, 2015 at 6:46 PM, Vineet Mishra <[email protected]> > >wrote: > > > >> > >> Why org.apache.kylin.job.hadoop.cube.CuboidReducer is running Single > >> Mapper/Reducer for the job. Can I have the understanding behind the > >>reason > >> of running it as single mapper/reducer. > >> > >> Thanks! > >> > >> On Fri, Jun 12, 2015 at 6:30 PM, Vineet Mishra <[email protected]> > >> wrote: > >> > >>> Hi All, > >>> > >>> I am building a cube using Kylin and I could see that the job is > >>>running > >>> with Single Mapper and Reducer for some of the intermediate process > >>>such as > >>> > >>> Extract Fact Table Distinct Columns > >>> Build Dimension Dictionary > >>> Build N-Dimension Cuboid > >>> > >>> I am not sure what's the reason behind running the job with single M/R, > >>> is it really necessary or is it some default config. which can be > >>>tweaked, > >>> its 70 Mins and the job status is 25% ! > >>> > >>> Urgent Call! > >>> > >>> Thanks! > >>> > >> > >> > >
