Hi Mishra
It is as what you described, since the data size is 30MB only, hadoop
will only manage to run MR in single mapper and reduce.
As for your question about why it takes so long to run on a small
dataset, can you please dig into the web page of *map reduce task status *to
check how much time the MR job really take, so that we can make sure if the
time is consuming on the MR job or kylin job scheduling module
Vineet Mishra <[email protected]>于2015年6月15日周一 下午4:51写道:
> Shi,
>
> Hadoop is setup correctly on my cluster with the default block size of
> 128Mb and its indeed very much running multiple Mapper/Reducer based jobs
> for other cases.
>
> Its the only Kylin Cube building which is running through single M/R job.
>
> Moreover to my surprise, the 4th running job which is Build Base Cuboid
> Data shows Data size as 30Mb, is it the reason due to which single mapper
> is getting invoked, if that being the case then even to process such a
> small data set why is it taking around 50min.
>
> Thanks,
>
> On Mon, Jun 15, 2015 at 11:17 AM, Shi, Shaofeng <[email protected]> wrote:
>
> > Yes you can adjust these parameters, for example give a smaller value for
> > kylin.job.mapreduce.default.reduce.input.mb; but it only affects the
> > reducer number;
> >
> > I suggest you investigate why there is only 1 mapper be started; Some
> > factors like hadoop cluster size, HDFS file block size will impact this;
> > You can run a SQL (a query which need run MR, not simple select *) with
> > hive -e, and then use the MR job track URL to see how many mappers be
> > triggered; If it is still single, then the problem is in your hadoop
> > configuration; Otherwise it may exists in Kylin, check if you put some
> > additional parameter in conf/kylin_job_conf.xml.
> >
> >
> > On 6/15/15, 2:52 AM, "Vineet Mishra" <[email protected]> wrote:
> >
> > >Can I have specification for these properties?
> > >
> > >KYLIN_JOB_MAPREDUCE_DEFAULT_REDUCE_COUNT_RATIO =
> > >"kylin.job.mapreduce.default.reduce.count.ratio";
> > >KYLIN_JOB_MAPREDUCE_DEFAULT_REDUCE_INPUT_MB =
> > >"kylin.job.mapreduce.default.reduce.input.mb";
> > >KYLIN_JOB_MAPREDUCE_MAX_REDUCER_NUMBER =
> > >"kylin.job.mapreduce.max.reducer.number";
> > >
> > >Thanks!
> > >
> > >On Sun, Jun 14, 2015 at 11:59 PM, Vineet Mishra <[email protected]
> >
> > >wrote:
> > >
> > >> Hi Shi,
> > >>
> > >> Its alright!
> > >> So I was wondering my source hive Table is around 3 Gb, despite of my
> > >>hive
> > >> table being partitioned and holding the data around 50-70 Mb per
> > >>partition
> > >> the Mapper and Reducer getting spawned are single. The amount of data
> > >>that
> > >> is being processed in the M/R is nothing as expected but it takes hell
> > >>lot
> > >> of time.
> > >>
> > >> As mentioned in the trailing mail that the job is getting very slow,
> the
> > >> process Build Base Cuboid Data itself takes around 50mins to get
> > >> completed.
> > >>
> > >> I can tweak the reducer parameter mentioned by you but do u think that
> > >> will make a difference since the mapper is where the most of the time
> is
> > >> spent.
> > >>
> > >> Can you share your thoughts for performance tuning for the cube build!
> > >>
> > >> Thanks!
> > >>
> > >> On Sun, Jun 14, 2015 at 7:26 PM, Shi, Shaofeng <[email protected]>
> > wrote:
> > >>
> > >>> Hi, sorry, a busy weekend;
> > >>>
> > >>> Usually Kylin will request proper number of mapper and reducers; If
> you
> > >>> see single mapper/recudder, how much of your input and output? If
> your
> > >>> cube is quite small, single mapper/reducer is possible;
> > >>>
> > >>> Number of mappers is decided by the FileInputFormat; But number of
> > >>>reducer
> > >>> was set by Kylin, see:
> > >>>
> > >>>
> > >>>
> > https://github.com/apache/incubator-kylin/blob/master/job/src/main/java/
> > >>>org
> > >>> /apache/kylin/job/hadoop/cube/CuboidJob.java#L141
> > >>>
> > >>><
> > https://github.com/apache/incubator-kylin/blob/master/job/src/main/java
> > >>>/org/apache/kylin/job/hadoop/cube/CuboidJob.java#L141>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On 6/14/15, 5:25 PM, "Vineet Mishra" <[email protected]> wrote:
> > >>>
> > >>> >Urgent call, any follow up on this?
> > >>> >
> > >>> >On Fri, Jun 12, 2015 at 6:46 PM, Vineet Mishra
> > >>><[email protected]>
> > >>> >wrote:
> > >>> >
> > >>> >>
> > >>> >> Why org.apache.kylin.job.hadoop.cube.CuboidReducer is running
> Single
> > >>> >> Mapper/Reducer for the job. Can I have the understanding behind
> the
> > >>> >>reason
> > >>> >> of running it as single mapper/reducer.
> > >>> >>
> > >>> >> Thanks!
> > >>> >>
> > >>> >> On Fri, Jun 12, 2015 at 6:30 PM, Vineet Mishra
> > >>><[email protected]
> > >>> >
> > >>> >> wrote:
> > >>> >>
> > >>> >>> Hi All,
> > >>> >>>
> > >>> >>> I am building a cube using Kylin and I could see that the job is
> > >>> >>>running
> > >>> >>> with Single Mapper and Reducer for some of the intermediate
> process
> > >>> >>>such as
> > >>> >>>
> > >>> >>> Extract Fact Table Distinct Columns
> > >>> >>> Build Dimension Dictionary
> > >>> >>> Build N-Dimension Cuboid
> > >>> >>>
> > >>> >>> I am not sure what's the reason behind running the job with
> single
> > >>> M/R,
> > >>> >>> is it really necessary or is it some default config. which can be
> > >>> >>>tweaked,
> > >>> >>> its 70 Mins and the job status is 25% !
> > >>> >>>
> > >>> >>> Urgent Call!
> > >>> >>>
> > >>> >>> Thanks!
> > >>> >>>
> > >>> >>
> > >>> >>
> > >>>
> > >>>
> > >>
> >
> >
>