Yes you can adjust these parameters, for example give a smaller value for kylin.job.mapreduce.default.reduce.input.mb; but it only affects the reducer number;
I suggest you investigate why there is only 1 mapper be started; Some factors like hadoop cluster size, HDFS file block size will impact this; You can run a SQL (a query which need run MR, not simple select *) with hive -e, and then use the MR job track URL to see how many mappers be triggered; If it is still single, then the problem is in your hadoop configuration; Otherwise it may exists in Kylin, check if you put some additional parameter in conf/kylin_job_conf.xml. On 6/15/15, 2:52 AM, "Vineet Mishra" <[email protected]> wrote: >Can I have specification for these properties? > >KYLIN_JOB_MAPREDUCE_DEFAULT_REDUCE_COUNT_RATIO = >"kylin.job.mapreduce.default.reduce.count.ratio"; >KYLIN_JOB_MAPREDUCE_DEFAULT_REDUCE_INPUT_MB = >"kylin.job.mapreduce.default.reduce.input.mb"; >KYLIN_JOB_MAPREDUCE_MAX_REDUCER_NUMBER = >"kylin.job.mapreduce.max.reducer.number"; > >Thanks! > >On Sun, Jun 14, 2015 at 11:59 PM, Vineet Mishra <[email protected]> >wrote: > >> Hi Shi, >> >> Its alright! >> So I was wondering my source hive Table is around 3 Gb, despite of my >>hive >> table being partitioned and holding the data around 50-70 Mb per >>partition >> the Mapper and Reducer getting spawned are single. The amount of data >>that >> is being processed in the M/R is nothing as expected but it takes hell >>lot >> of time. >> >> As mentioned in the trailing mail that the job is getting very slow, the >> process Build Base Cuboid Data itself takes around 50mins to get >> completed. >> >> I can tweak the reducer parameter mentioned by you but do u think that >> will make a difference since the mapper is where the most of the time is >> spent. >> >> Can you share your thoughts for performance tuning for the cube build! >> >> Thanks! >> >> On Sun, Jun 14, 2015 at 7:26 PM, Shi, Shaofeng <[email protected]> wrote: >> >>> Hi, sorry, a busy weekend; >>> >>> Usually Kylin will request proper number of mapper and reducers; If you >>> see single mapper/recudder, how much of your input and output? If your >>> cube is quite small, single mapper/reducer is possible; >>> >>> Number of mappers is decided by the FileInputFormat; But number of >>>reducer >>> was set by Kylin, see: >>> >>> >>>https://github.com/apache/incubator-kylin/blob/master/job/src/main/java/ >>>org >>> /apache/kylin/job/hadoop/cube/CuboidJob.java#L141 >>> >>><https://github.com/apache/incubator-kylin/blob/master/job/src/main/java >>>/org/apache/kylin/job/hadoop/cube/CuboidJob.java#L141> >>> >>> >>> >>> >>> On 6/14/15, 5:25 PM, "Vineet Mishra" <[email protected]> wrote: >>> >>> >Urgent call, any follow up on this? >>> > >>> >On Fri, Jun 12, 2015 at 6:46 PM, Vineet Mishra >>><[email protected]> >>> >wrote: >>> > >>> >> >>> >> Why org.apache.kylin.job.hadoop.cube.CuboidReducer is running Single >>> >> Mapper/Reducer for the job. Can I have the understanding behind the >>> >>reason >>> >> of running it as single mapper/reducer. >>> >> >>> >> Thanks! >>> >> >>> >> On Fri, Jun 12, 2015 at 6:30 PM, Vineet Mishra >>><[email protected] >>> > >>> >> wrote: >>> >> >>> >>> Hi All, >>> >>> >>> >>> I am building a cube using Kylin and I could see that the job is >>> >>>running >>> >>> with Single Mapper and Reducer for some of the intermediate process >>> >>>such as >>> >>> >>> >>> Extract Fact Table Distinct Columns >>> >>> Build Dimension Dictionary >>> >>> Build N-Dimension Cuboid >>> >>> >>> >>> I am not sure what's the reason behind running the job with single >>> M/R, >>> >>> is it really necessary or is it some default config. which can be >>> >>>tweaked, >>> >>> its 70 Mins and the job status is 25% ! >>> >>> >>> >>> Urgent Call! >>> >>> >>> >>> Thanks! >>> >>> >>> >> >>> >> >>> >>> >>
