Re: Jobs Running with Single Mapper/Reducer

Shi, Shaofeng Sun, 14 Jun 2015 22:48:07 -0700

Yes you can adjust these parameters, for example give a smaller value for
kylin.job.mapreduce.default.reduce.input.mb; but it only affects the
reducer number;


I suggest you investigate why there is only 1 mapper be started; Some
factors like hadoop cluster size, HDFS file block size will impact this;
You can run a SQL (a query which need run MR, not simple select *) with
hive -e, and then use the MR job track URL to see how many mappers be
triggered; If it is still single, then the problem is in your hadoop
configuration; Otherwise it may exists in Kylin, check if you put some
additional parameter in conf/kylin_job_conf.xml.


On 6/15/15, 2:52 AM, "Vineet Mishra" <[email protected]> wrote:

>Can I have specification for these properties?
>
>KYLIN_JOB_MAPREDUCE_DEFAULT_REDUCE_COUNT_RATIO =
>"kylin.job.mapreduce.default.reduce.count.ratio";
>KYLIN_JOB_MAPREDUCE_DEFAULT_REDUCE_INPUT_MB =
>"kylin.job.mapreduce.default.reduce.input.mb";
>KYLIN_JOB_MAPREDUCE_MAX_REDUCER_NUMBER =
>"kylin.job.mapreduce.max.reducer.number";
>
>Thanks!
>
>On Sun, Jun 14, 2015 at 11:59 PM, Vineet Mishra <[email protected]>
>wrote:
>
>> Hi Shi,
>>
>> Its alright!
>> So I was wondering my source hive Table is around 3 Gb, despite of my
>>hive
>> table being partitioned and holding the data around 50-70 Mb per
>>partition
>> the Mapper and Reducer getting spawned are single. The amount of data
>>that
>> is being processed in the M/R is nothing as expected but it takes hell
>>lot
>> of time.
>>
>> As mentioned in the trailing mail that the job is getting very slow, the
>> process Build Base Cuboid Data itself takes around 50mins to get
>> completed.
>>
>> I can tweak the reducer parameter mentioned by you but do u think that
>> will make a difference since the mapper is where the most of the time is
>> spent.
>>
>> Can you share your thoughts for performance tuning for the cube build!
>>
>> Thanks!
>>
>> On Sun, Jun 14, 2015 at 7:26 PM, Shi, Shaofeng <[email protected]> wrote:
>>
>>> Hi, sorry, a busy weekend;
>>>
>>> Usually Kylin will request proper number of mapper and reducers; If you
>>> see single mapper/recudder, how much of your input and output? If your
>>> cube is quite small, single mapper/reducer is possible;
>>>
>>> Number of mappers is decided by the FileInputFormat; But number of
>>>reducer
>>> was set by Kylin, see:
>>>
>>> 
>>>https://github.com/apache/incubator-kylin/blob/master/job/src/main/java/
>>>org
>>> /apache/kylin/job/hadoop/cube/CuboidJob.java#L141
>>> 
>>><https://github.com/apache/incubator-kylin/blob/master/job/src/main/java
>>>/org/apache/kylin/job/hadoop/cube/CuboidJob.java#L141>
>>>
>>>
>>>
>>>
>>> On 6/14/15, 5:25 PM, "Vineet Mishra" <[email protected]> wrote:
>>>
>>> >Urgent call, any follow up on this?
>>> >
>>> >On Fri, Jun 12, 2015 at 6:46 PM, Vineet Mishra
>>><[email protected]>
>>> >wrote:
>>> >
>>> >>
>>> >> Why org.apache.kylin.job.hadoop.cube.CuboidReducer is running Single
>>> >> Mapper/Reducer for the job. Can I have the understanding behind the
>>> >>reason
>>> >> of running it as single mapper/reducer.
>>> >>
>>> >> Thanks!
>>> >>
>>> >> On Fri, Jun 12, 2015 at 6:30 PM, Vineet Mishra
>>><[email protected]
>>> >
>>> >> wrote:
>>> >>
>>> >>> Hi All,
>>> >>>
>>> >>> I am building a cube using Kylin and I could see that the job is
>>> >>>running
>>> >>> with Single Mapper and Reducer for some of the intermediate process
>>> >>>such as
>>> >>>
>>> >>> Extract Fact Table Distinct Columns
>>> >>> Build Dimension Dictionary
>>> >>> Build N-Dimension Cuboid
>>> >>>
>>> >>> I am not sure what's the reason behind running the job with single
>>> M/R,
>>> >>> is it really necessary or is it some default config. which can be
>>> >>>tweaked,
>>> >>> its 70 Mins and the job status is 25% !
>>> >>>
>>> >>> Urgent Call!
>>> >>>
>>> >>> Thanks!
>>> >>>
>>> >>
>>> >>
>>>
>>>
>>

Re: Jobs Running with Single Mapper/Reducer

Reply via email to