Re: Jobs Running with Single Mapper/Reducer

Vineet Mishra Tue, 16 Jun 2015 01:35:18 -0700

Let me clarify,

1) So initially I had a cube say cube1 with some dimension and measures
which was of size 420Mb
2) later I created another cube say cube2 with same dimension as cube1 but
with added measures(so in cube1 it was 1 measure while in cube2 it was 8)
which was of size 130Mb(expected greater size due to addition of multiple
measure but it was lesser than the cube1)


3) thereafter I created another cube say cube3 with same properties as
cube2 and made the build and it was 120Mb(slightly lesser than the last
build with the same data source)

So I was looking out as what's the reason as the cube size is varying and
moreover to my surprise the cube2 which is holding extra measures as in
comparison to cube1 is even lesser in size.

I guess that makes things clear to you.

Thanks,

On Tue, Jun 16, 2015 at 1:25 PM, ShaoFeng Shi <[email protected]> wrote:

> I'm confused; You said these two cubes are exactly same in an early reply,
> but now you mentioned the seoncd has more measures... they are NOT the
> same; Different measures takes different storage spaces; What kind of the
> measures defined in your former cube? Did you use DISTINCT COUNT measure?
>
> 2015-06-16 3:28 GMT+08:00 Vineet Mishra <[email protected]>:
>
> > Hi Shi,
> >
> > Well yes these cubes are build for the same date range without having the
> > data changed on the table and moreover hbase compression was not even
> > touched so far.
> >
> > One interesting thing which I got was the later cube which was small in
> > size and completing in lesser time was having multiple measures(around 7
> or
> > 8) in comparison to the former cube having just 1 or perhaps 2 measure
> with
> > more time to complete.
> >
> > Running the same job again, it came up with some different size for the
> > same data, so last time it was reduced 130Mb now its even less, around
> > 123Mb. This change is observed within 4 hour of the running the last job
> > with the same data set.
> >
> > I am not sure but how the kylin is internally building these cube
> dimension
> > so as to inflate the data size by 3 and more over when the measures are
> > very few as in comparison.
> >
> > Let me know if you could get anything out of it.
> >
> > Thanks!
> >
> > On Mon, Jun 15, 2015 at 8:51 PM, Shi, Shaofeng <[email protected]> wrote:
> >
> > > Just guess: Are these two cube builds cross the same date range? Was
> the
> > > data on fact table changed between these two builds? Did HBase enable
> > > compression recently?
> > >
> > > You can build once more with the same cube and date range to see
> whether
> > > it can be re-produced; The cube size should be consistent in my
> > knowledge;
> > >
> > > On 6/15/15, 10:52 PM, "Vineet Mishra" <[email protected]> wrote:
> > >
> > > >Thanks Shi,
> > > >
> > > >By the way I was curious as for the same table, dimension and metric,
> I
> > > >could see the final cube size varying from 130 Mb to 420Mb.
> > > >
> > > >The cube is exactly the same as another one but when the cube build
> job
> > is
> > > >run I could see the cube size difference as mentioned below,
> > > >
> > > >*Earlier Cube Build*
> > > >Size - 420 Mb
> > > >Time Taken for building cube - 140 mins
> > > >Source Records - ~11Million
> > > >
> > > >*Today's Cube Build*
> > > >Size - 130 Mb
> > > >Time Taken for building cube - 22 mins
> > > >Source Records - ~11Million
> > > >
> > > >I don't find any reason to have three time cube size difference. Any
> > ideas
> > > >about this?
> > > >
> > > >Thanks!
> > > >
> > > >On Mon, Jun 15, 2015 at 8:08 PM, Shi, Shaofeng <[email protected]>
> > wrote:
> > > >
> > > >> Good catch, and thanks for the sharing;
> > > >>
> > > >> On 6/15/15, 10:20 PM, "Vineet Mishra" <[email protected]>
> wrote:
> > > >>
> > > >> >Hi All,
> > > >> >
> > > >> >Well I got it through.
> > > >> >
> > > >> >Actually it was basically the pre-process which was converting my
> 3Gb
> > > >>hive
> > > >> >table to 60Mb sequence file which due to block size of 128Mb was
> > > >>running
> > > >> >in
> > > >> >a single Mapper job.
> > > >> >
> > > >> >I changed the split size to 20Mb and it was able to spawn 3 Mapper
> > > >>after
> > > >> >that.
> > > >> >
> > > >> >Anyways, thanks all for your quick response.
> > > >> >
> > > >> >Thanks!
> > > >> >
> > > >> >On Mon, Jun 15, 2015 at 3:27 PM, Vineet Mishra <
> > [email protected]
> > > >
> > > >> >wrote:
> > > >> >
> > > >> >> Hi,
> > > >> >>
> > > >> >> Sorry I misunderstood this value, actually 30Mb was the output
> for
> > > >>that
> > > >> >> job. But I guess I have the got reason, (not exactly confirmed)
> as
> > > >>why
> > > >> >>its
> > > >> >> behaving like this, the input to this job is sequence file which
> is
> > > >> >>60Mb in
> > > >> >> size and compressed.
> > > >> >>
> > > >> >> No doubt when it will be passed to the mapper, it is very much
> > > >>certain
> > > >> >>to
> > > >> >> have the single mapper invoked due to the pre-configured block
> > size.
> > > >> >>Now I
> > > >> >> need to check If I can override the splitSize property for the
> job.
> > > >> >>
> > > >> >> Moreover I can see that the job is running seamlessly slow,
> kindly
> > > >>find
> > > >> >> the stack trace for the job mentioned below.
> > > >> >>
> > > >> >> 2015-06-13 10:08:00,940 INFO [main]
> > org.apache.hadoop.mapred.MapTask:
> > > >> >> Processing split: hdfs://
> > > >> >>
> > > >> >>
> > > >>
> > > >>
> > >
> >
> dev-hadoop-namenode.com:8020/tmp/kylin-9b675c66-4ce9-4a33-a356-2ccf9dbaca
> > > >>
> > >
> >
> >>>>6a/kylin_intermediate_xyz_20150401000000_20150611000000_9b675c66_4ce9_4
> > > >>>>a3
> > > >> >>3_a356_2ccf9dbaca6a/000000_0:0+60440271
> > > >> >> 2015-06-13 10:08:01,014 INFO [main]
> > org.apache.hadoop.mapred.MapTask:
> > > >> >> (EQUATOR) 0 kvi 67108860(268435440)
> > > >> >> 2015-06-13 10:08:01,015 INFO [main]
> > org.apache.hadoop.mapred.MapTask:
> > > >> >> mapreduce.task.io.sort.mb: 256
> > > >> >> 2015-06-13 10:08:01,015 INFO [main]
> > org.apache.hadoop.mapred.MapTask:
> > > >> >>soft
> > > >> >> limit at 214748368
> > > >> >> 2015-06-13 10:08:01,015 INFO [main]
> > org.apache.hadoop.mapred.MapTask:
> > > >> >> bufstart = 0; bufvoid = 268435456
> > > >> >> 2015-06-13 10:08:01,015 INFO [main]
> > org.apache.hadoop.mapred.MapTask:
> > > >> >> kvstart = 67108860; length = 16777216
> > > >> >> 2015-06-13 10:08:01,023 INFO [main]
> > org.apache.hadoop.mapred.MapTask:
> > > >> >>Map
> > > >> >> output collector class =
> > > >> >>org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> > > >> >> 2015-06-13 10:08:01,048 INFO [main]
> > > >> >> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully
> > loaded &
> > > >> >> initialized native-zlib library
> > > >> >> 2015-06-13 10:08:01,048 INFO [main]
> > > >> >> org.apache.hadoop.io.compress.CodecPool: Got brand-new
> decompressor
> > > >> >> [.deflate]
> > > >> >> 2015-06-13 10:08:01,052 INFO [main]
> > > >> >> org.apache.hadoop.io.compress.CodecPool: Got brand-new
> decompressor
> > > >> >> [.deflate]
> > > >> >> 2015-06-13 10:08:01,052 INFO [main]
> > > >> >> org.apache.hadoop.io.compress.CodecPool: Got brand-new
> decompressor
> > > >> >> [.deflate]
> > > >> >> 2015-06-13 10:08:01,053 INFO [main]
> > > >> >> org.apache.hadoop.io.compress.CodecPool: Got brand-new
> decompressor
> > > >> >> [.deflate]
> > > >> >> 2015-06-13 10:08:01,058 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.AbstractHadoopJob: The absolute path
> > for
> > > >> >>meta
> > > >> >> dir is
> > > >> >>
> > > >>
> > >
> >
> >>>>/yarn/nm/usercache/biops/appcache/application_1433833901375_0139/contai
> > > >>>>ne
> > > >> >>r_1433833901375_0139_01_000002/meta
> > > >> >> 2015-06-13 10:08:01,059 INFO [main]
> > > >>org.apache.kylin.common.KylinConfig:
> > > >> >> Use
> > > >> >>
> > > >>
> > >
> >
> >>>>KYLIN_CONF=/yarn/nm/usercache/biops/appcache/application_1433833901375_
> > > >>>>01
> > > >> >>39/container_1433833901375_0139_01_000002/meta
> > > >> >> 2015-06-13 10:08:01,084 INFO [main]
> > > >>org.apache.kylin.cube.CubeManager:
> > > >> >> Initializing CubeManager with config
> > > >> >> /yarn/nm/usercache/biops/filecache/19452/meta
> > > >> >> 2015-06-13 10:08:01,086 INFO [main]
> > > >> >> org.apache.kylin.common.persistence.ResourceStore: Using metadata
> > url
> > > >> >> /yarn/nm/usercache/biops/filecache/19452/meta for resource store
> > > >> >> 2015-06-13 10:08:01,327 INFO [main]
> > > >> >>org.apache.kylin.cube.CubeDescManager:
> > > >> >> Initializing CubeDescManager with config
> > > >> >> /yarn/nm/usercache/biops/filecache/19452/meta
> > > >> >> 2015-06-13 10:08:01,327 INFO [main]
> > > >> >>org.apache.kylin.cube.CubeDescManager:
> > > >> >> Reloading Cube Metadata from folder
> > > >> >> /yarn/nm/usercache/biops/filecache/19452/meta/cube_desc
> > > >> >> 2015-06-13 10:08:22,834 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 100000
> > > >> >>records!
> > > >> >> 2015-06-13 10:08:43,225 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 200000
> > > >> >>records!
> > > >> >> 2015-06-13 10:09:03,622 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 300000
> > > >> >>records!
> > > >> >> 2015-06-13 10:09:23,999 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 400000
> > > >> >>records!
> > > >> >> 2015-06-13 10:09:44,372 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 500000
> > > >> >>records!
> > > >> >> 2015-06-13 10:10:04,780 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 600000
> > > >> >>records!
> > > >> >> 2015-06-13 10:10:25,143 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 700000
> > > >> >>records!
> > > >> >> 2015-06-13 10:10:45,512 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 800000
> > > >> >>records!
> > > >> >> 2015-06-13 10:11:05,895 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 900000
> > > >> >>records!
> > > >> >> 2015-06-13 10:11:26,284 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 1000000
> > > >> >>records!
> > > >> >> 2015-06-13 10:11:46,716 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 1100000
> > > >> >>records!
> > > >> >> 2015-06-13 10:12:07,174 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 1200000
> > > >> >>records!
> > > >> >> 2015-06-13 10:12:27,646 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 1300000
> > > >> >>records!
> > > >> >> 2015-06-13 10:12:48,127 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 1400000
> > > >> >>records!
> > > >> >> 2015-06-13 10:13:08,614 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 1500000
> > > >> >>records!
> > > >> >> 2015-06-13 10:13:29,148 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 1600000
> > > >> >>records!
> > > >> >> 2015-06-13 10:13:49,686 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 1700000
> > > >> >>records!
> > > >> >> 2015-06-13 10:14:10,167 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 1800000
> > > >> >>records!
> > > >> >> 2015-06-13 10:14:30,652 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 1900000
> > > >> >>records!
> > > >> >> 2015-06-13 10:14:51,199 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 2000000
> > > >> >>records!
> > > >> >> 2015-06-13 10:15:11,719 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 2100000
> > > >> >>records!
> > > >> >> 2015-06-13 10:15:32,221 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 2200000
> > > >> >>records!
> > > >> >> 2015-06-13 10:15:52,717 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 2300000
> > > >> >>records!
> > > >> >> 2015-06-13 10:16:13,252 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 2400000
> > > >> >>records!
> > > >> >> 2015-06-13 10:16:33,769 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 2500000
> > > >> >>records!
> > > >> >> 2015-06-13 10:16:54,263 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 2600000
> > > >> >>records!
> > > >> >> 2015-06-13 10:17:14,741 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 2700000
> > > >> >>records!
> > > >> >> 2015-06-13 10:17:35,213 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 2800000
> > > >> >>records!
> > > >> >> 2015-06-13 10:17:55,699 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 2900000
> > > >> >>records!
> > > >> >> 2015-06-13 10:18:16,204 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 3000000
> > > >> >>records!
> > > >> >> 2015-06-13 10:18:36,721 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 3100000
> > > >> >>records!
> > > >> >> 2015-06-13 10:18:57,249 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 3200000
> > > >> >>records!
> > > >> >> 2015-06-13 10:19:17,743 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 3300000
> > > >> >>records!
> > > >> >> 2015-06-13 10:19:38,275 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 3400000
> > > >> >>records!
> > > >> >> 2015-06-13 10:19:58,812 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 3500000
> > > >> >>records!
> > > >> >> 2015-06-13 10:20:19,312 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 3600000
> > > >> >>records!
> > > >> >> 2015-06-13 10:20:39,920 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 3700000
> > > >> >>records!
> > > >> >> 2015-06-13 10:21:00,518 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 3800000
> > > >> >>records!
> > > >> >> 2015-06-13 10:21:21,008 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 3900000
> > > >> >>records!
> > > >> >> 2015-06-13 10:21:41,525 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 4000000
> > > >> >>records!
> > > >> >> 2015-06-13 10:22:02,023 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 4100000
> > > >> >>records!
> > > >> >> 2015-06-13 10:22:22,534 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 4200000
> > > >> >>records!
> > > >> >> 2015-06-13 10:22:43,063 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 4300000
> > > >> >>records!
> > > >> >> 2015-06-13 10:23:03,558 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 4400000
> > > >> >>records!
> > > >> >> 2015-06-13 10:23:24,043 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 4500000
> > > >> >>records!
> > > >> >> 2015-06-13 10:23:44,537 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 4600000
> > > >> >>records!
> > > >> >> 2015-06-13 10:24:05,015 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 4700000
> > > >> >>records!
> > > >> >> 2015-06-13 10:24:25,490 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 4800000
> > > >> >>records!
> > > >> >> 2015-06-13 10:24:45,995 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 4900000
> > > >> >>records!
> > > >> >> 2015-06-13 10:25:06,487 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 5000000
> > > >> >>records!
> > > >> >> 2015-06-13 10:25:26,964 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 5100000
> > > >> >>records!
> > > >> >> 2015-06-13 10:25:47,524 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 5200000
> > > >> >>records!
> > > >> >> 2015-06-13 10:26:08,032 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 5300000
> > > >> >>records!
> > > >> >> 2015-06-13 10:26:28,514 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 5400000
> > > >> >>records!
> > > >> >> 2015-06-13 10:26:49,053 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 5500000
> > > >> >>records!
> > > >> >> 2015-06-13 10:26:50,360 INFO [main]
> > org.apache.hadoop.mapred.MapTask:
> > > >> >> Spilling map output
> > > >> >> 2015-06-13 10:26:50,360 INFO [main]
> > org.apache.hadoop.mapred.MapTask:
> > > >> >> bufstart = 0; bufend = 126646464; bufvoid = 268435456
> > > >> >> 2015-06-13 10:26:50,360 INFO [main]
> > org.apache.hadoop.mapred.MapTask:
> > > >> >> kvstart = 67108860(268435440); kvend = 45083392(180333568);
> length
> > =
> > > >> >> 22025469/16777216
> > > >> >> 2015-06-13 10:26:50,360 INFO [main]
> > org.apache.hadoop.mapred.MapTask:
> > > >> >> (EQUATOR) 148671936 kvi 37167980(148671920)
> > > >> >> 2015-06-13 10:26:54,802 INFO [SpillThread]
> > > >> >> org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor
> > > >> >>[.snappy]
> > > >> >> 2015-06-13 10:26:54,883 INFO [SpillThread]
> > > >> >> org.apache.kylin.job.hadoop.AbstractHadoopJob: The absolute path
> > for
> > > >> >>meta
> > > >> >> dir is
> > > >> >>
> > > >>
> > >
> >
> >>>>/yarn/nm/usercache/biops/appcache/application_1433833901375_0139/contai
> > > >>>>ne
> > > >> >>r_1433833901375_0139_01_000002/meta
> > > >> >> 2015-06-13 10:27:09,614 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 5600000
> > > >> >>records!
> > > >> >> 2015-06-13 10:27:30,135 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 5700000
> > > >> >>records!
> > > >> >> 2015-06-13 10:27:50,657 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 5800000
> > > >> >>records!
> > > >> >> 2015-06-13 10:27:59,691 INFO [SpillThread]
> > > >> >> org.apache.kylin.job.hadoop.cube.CuboidReducer: Handled 100000
> > > >>records!
> > > >> >> 2015-06-13 10:28:11,185 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 5900000
> > > >> >>records!
> > > >> >> 2015-06-13 10:28:31,751 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 6000000
> > > >> >>records!
> > > >> >> 2015-06-13 10:28:52,303 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 6100000
> > > >> >>records!
> > > >> >> 2015-06-13 10:29:03,235 INFO [SpillThread]
> > > >> >> org.apache.kylin.job.hadoop.cube.CuboidReducer: Handled 200000
> > > >>records!
> > > >> >> 2015-06-13 10:29:12,873 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 6200000
> > > >> >>records!
> > > >> >> 2015-06-13 10:29:33,424 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 6300000
> > > >> >>records!
> > > >> >> 2015-06-13 10:29:54,168 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 6400000
> > > >> >>records!
> > > >> >> 2015-06-13 10:30:14,771 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 6500000
> > > >> >>records!
> > > >> >> 2015-06-13 10:30:18,479 INFO [SpillThread]
> > > >> >> org.apache.kylin.job.hadoop.cube.CuboidReducer: Handled 300000
> > > >>records!
> > > >> >> 2015-06-13 10:30:35,325 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 6600000
> > > >> >>records!
> > > >> >> 2015-06-13 10:30:39,831 INFO [SpillThread]
> > > >> >> org.apache.hadoop.mapred.MapTask: Finished spill 0
> > > >> >> 2015-06-13 10:30:39,831 INFO [main]
> > org.apache.hadoop.mapred.MapTask:
> > > >> >> (RESET) equator 148671936 kv 37167980(148671920) kvi
> > > >>32705668(130822672)
> > > >> >> 2015-06-13 10:30:55,807 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 6700000
> > > >> >>records!
> > > >> >> 2015-06-13 10:31:16,301 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 6800000
> > > >> >>records!
> > > >> >> 2015-06-13 10:31:36,804 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 6900000
> > > >> >>records!
> > > >> >> 2015-06-13 10:31:57,302 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 7000000
> > > >> >>records!
> > > >> >> 2015-06-13 10:32:17,790 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 7100000
> > > >> >>records!
> > > >> >> 2015-06-13 10:32:38,294 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 7200000
> > > >> >>records!
> > > >> >> 2015-06-13 10:32:58,830 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 7300000
> > > >> >>records!
> > > >> >> 2015-06-13 10:33:19,354 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 7400000
> > > >> >>records!
> > > >> >> 2015-06-13 10:33:39,866 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 7500000
> > > >> >>records!
> > > >> >> 2015-06-13 10:34:00,371 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 7600000
> > > >> >>records!
> > > >> >> 2015-06-13 10:34:20,872 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 7700000
> > > >> >>records!
> > > >> >> 2015-06-13 10:34:41,387 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 7800000
> > > >> >>records!
> > > >> >> 2015-06-13 10:35:01,884 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 7900000
> > > >> >>records!
> > > >> >> 2015-06-13 10:35:22,368 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 8000000
> > > >> >>records!
> > > >> >> 2015-06-13 10:35:42,868 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 8100000
> > > >> >>records!
> > > >> >> 2015-06-13 10:36:03,361 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 8200000
> > > >> >>records!
> > > >> >> 2015-06-13 10:36:23,981 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 8300000
> > > >> >>records!
> > > >> >> 2015-06-13 10:36:44,529 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 8400000
> > > >> >>records!
> > > >> >> 2015-06-13 10:37:05,034 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 8500000
> > > >> >>records!
> > > >> >> 2015-06-13 10:37:25,609 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 8600000
> > > >> >>records!
> > > >> >> 2015-06-13 10:37:46,152 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 8700000
> > > >> >>records!
> > > >> >> 2015-06-13 10:38:06,725 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 8800000
> > > >> >>records!
> > > >> >> 2015-06-13 10:38:27,296 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 8900000
> > > >> >>records!
> > > >> >> 2015-06-13 10:38:47,905 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 9000000
> > > >> >>records!
> > > >> >> 2015-06-13 10:39:08,430 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 9100000
> > > >> >>records!
> > > >> >> 2015-06-13 10:39:28,952 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 9200000
> > > >> >>records!
> > > >> >> 2015-06-13 10:39:49,460 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 9300000
> > > >> >>records!
> > > >> >> 2015-06-13 10:40:09,987 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 9400000
> > > >> >>records!
> > > >> >> 2015-06-13 10:40:30,551 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 9500000
> > > >> >>records!
> > > >> >> 2015-06-13 10:40:51,080 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 9600000
> > > >> >>records!
> > > >> >> 2015-06-13 10:41:11,579 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 9700000
> > > >> >>records!
> > > >> >> 2015-06-13 10:41:32,122 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 9800000
> > > >> >>records!
> > > >> >> 2015-06-13 10:41:52,623 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 9900000
> > > >> >>records!
> > > >> >> 2015-06-13 10:42:13,128 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 10000000
> > > >> >>records!
> > > >> >> 2015-06-13 10:42:33,628 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 10100000
> > > >> >>records!
> > > >> >> 2015-06-13 10:42:54,128 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 10200000
> > > >> >>records!
> > > >> >> 2015-06-13 10:43:14,629 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 10300000
> > > >> >>records!
> > > >> >> 2015-06-13 10:43:35,149 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 10400000
> > > >> >>records!
> > > >> >> 2015-06-13 10:43:55,764 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 10500000
> > > >> >>records!
> > > >> >> 2015-06-13 10:44:16,303 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 10600000
> > > >> >>records!
> > > >> >> 2015-06-13 10:44:36,857 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 10700000
> > > >> >>records!
> > > >> >> 2015-06-13 10:44:57,349 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 10800000
> > > >> >>records!
> > > >> >> 2015-06-13 10:45:17,840 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 10900000
> > > >> >>records!
> > > >> >> 2015-06-13 10:45:38,441 INFO [main]
> > > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled
> 11000000
> > > >> >>records!
> > > >> >> 2015-06-13 10:45:41,057 INFO [main]
> > org.apache.hadoop.mapred.MapTask:
> > > >> >> Spilling map output
> > > >> >> 2015-06-13 10:45:41,057 INFO [main]
> > org.apache.hadoop.mapred.MapTask:
> > > >> >> bufstart = 148671936; bufend = 6882957; bufvoid = 268435443
> > > >> >> 2015-06-13 10:45:41,057 INFO [main]
> > org.apache.hadoop.mapred.MapTask:
> > > >> >> kvstart = 37167980(148671920); kvend = 15142512(60570048);
> length =
> > > >> >> 22025469/16777216
> > > >> >> 2015-06-13 10:45:41,057 INFO [main]
> > org.apache.hadoop.mapred.MapTask:
> > > >> >> (EQUATOR) 28908413 kvi 7227096(28908384)
> > > >> >>
> > > >> >> Thanks,
> > > >> >>
> > > >> >> On Mon, Jun 15, 2015 at 3:11 PM, 周千昊 <[email protected]> wrote:
> > > >> >>
> > > >> >>> Hi Mishra
> > > >> >>>      It is as what you described, since the data size is 30MB
> > only,
> > > >> >>>hadoop
> > > >> >>> will only manage to run MR in single mapper and reduce.
> > > >> >>>      As for your question about why it takes so long to run on a
> > > >>small
> > > >> >>> dataset, can you please dig into the web page of *map reduce
> task
> > > >> >>>status
> > > >> >>> *to
> > > >> >>> check how much time the MR job really take, so that we can make
> > > >>sure if
> > > >> >>> the
> > > >> >>> time is consuming on the MR job or kylin job scheduling module
> > > >> >>>
> > > >> >>> Vineet Mishra <[email protected]>于2015年6月15日周一 下午4:51写道：
> > > >> >>>
> > > >> >>> > Shi,
> > > >> >>> >
> > > >> >>> > Hadoop is setup correctly on my cluster with the default block
> > > >>size
> > > >> >>>of
> > > >> >>> > 128Mb and its indeed very much running multiple Mapper/Reducer
> > > >>based
> > > >> >>> jobs
> > > >> >>> > for other cases.
> > > >> >>> >
> > > >> >>> > Its the only Kylin Cube building which is running through
> single
> > > >>M/R
> > > >> >>> job.
> > > >> >>> >
> > > >> >>> > Moreover to my surprise, the 4th running job which is Build
> Base
> > > >> >>>Cuboid
> > > >> >>> > Data shows Data size as 30Mb, is it the reason due to which
> > single
> > > >> >>> mapper
> > > >> >>> > is getting invoked, if that being the case then even to
> process
> > > >>such
> > > >> >>>a
> > > >> >>> > small data set why is it taking around 50min.
> > > >> >>> >
> > > >> >>> > Thanks,
> > > >> >>> >
> > > >> >>> > On Mon, Jun 15, 2015 at 11:17 AM, Shi, Shaofeng <
> > [email protected]
> > > >
> > > >> >>> wrote:
> > > >> >>> >
> > > >> >>> > > Yes you can adjust these parameters, for example give a
> > smaller
> > > >> >>>value
> > > >> >>> for
> > > >> >>> > > kylin.job.mapreduce.default.reduce.input.mb; but it only
> > affects
> > > >> >>>the
> > > >> >>> > > reducer number;
> > > >> >>> > >
> > > >> >>> > > I suggest you investigate why there is only 1 mapper be
> > started;
> > > >> >>>Some
> > > >> >>> > > factors like hadoop cluster size, HDFS file block size will
> > > >>impact
> > > >> >>> this;
> > > >> >>> > > You can run a SQL (a query which need run MR, not simple
> > select
> > > >>*)
> > > >> >>> with
> > > >> >>> > > hive -e, and then use the MR job track URL to see how many
> > > >>mappers
> > > >> >>>be
> > > >> >>> > > triggered; If it is still single, then the problem is in
> your
> > > >> >>>hadoop
> > > >> >>> > > configuration; Otherwise it may exists in Kylin, check if
> you
> > > >>put
> > > >> >>>some
> > > >> >>> > > additional parameter in conf/kylin_job_conf.xml.
> > > >> >>> > >
> > > >> >>> > >
> > > >> >>> > > On 6/15/15, 2:52 AM, "Vineet Mishra" <
> [email protected]>
> > > >> >>>wrote:
> > > >> >>> > >
> > > >> >>> > > >Can I have specification for these properties?
> > > >> >>> > > >
> > > >> >>> > > >KYLIN_JOB_MAPREDUCE_DEFAULT_REDUCE_COUNT_RATIO =
> > > >> >>> > > >"kylin.job.mapreduce.default.reduce.count.ratio";
> > > >> >>> > > >KYLIN_JOB_MAPREDUCE_DEFAULT_REDUCE_INPUT_MB =
> > > >> >>> > > >"kylin.job.mapreduce.default.reduce.input.mb";
> > > >> >>> > > >KYLIN_JOB_MAPREDUCE_MAX_REDUCER_NUMBER =
> > > >> >>> > > >"kylin.job.mapreduce.max.reducer.number";
> > > >> >>> > > >
> > > >> >>> > > >Thanks!
> > > >> >>> > > >
> > > >> >>> > > >On Sun, Jun 14, 2015 at 11:59 PM, Vineet Mishra <
> > > >> >>> [email protected]
> > > >> >>> > >
> > > >> >>> > > >wrote:
> > > >> >>> > > >
> > > >> >>> > > >> Hi Shi,
> > > >> >>> > > >>
> > > >> >>> > > >> Its alright!
> > > >> >>> > > >> So I was wondering my source hive Table is around 3 Gb,
> > > >>despite
> > > >> >>>of
> > > >> >>> my
> > > >> >>> > > >>hive
> > > >> >>> > > >> table being partitioned and holding the data around 50-70
> > Mb
> > > >>per
> > > >> >>> > > >>partition
> > > >> >>> > > >> the Mapper and Reducer getting spawned are single. The
> > > >>amount of
> > > >> >>> data
> > > >> >>> > > >>that
> > > >> >>> > > >> is being processed in the M/R is nothing as expected but
> it
> > > >> >>>takes
> > > >> >>> hell
> > > >> >>> > > >>lot
> > > >> >>> > > >> of time.
> > > >> >>> > > >>
> > > >> >>> > > >> As mentioned in the trailing mail that the job is getting
> > > >>very
> > > >> >>> slow,
> > > >> >>> > the
> > > >> >>> > > >> process Build Base Cuboid Data itself takes around 50mins
> > to
> > > >>get
> > > >> >>> > > >> completed.
> > > >> >>> > > >>
> > > >> >>> > > >> I can tweak the reducer parameter mentioned by you but
> do u
> > > >> >>>think
> > > >> >>> that
> > > >> >>> > > >> will make a difference since the mapper is where the most
> > of
> > > >>the
> > > >> >>> time
> > > >> >>> > is
> > > >> >>> > > >> spent.
> > > >> >>> > > >>
> > > >> >>> > > >> Can you share your thoughts for performance tuning for
> the
> > > >>cube
> > > >> >>> build!
> > > >> >>> > > >>
> > > >> >>> > > >> Thanks!
> > > >> >>> > > >>
> > > >> >>> > > >> On Sun, Jun 14, 2015 at 7:26 PM, Shi, Shaofeng
> > > >> >>><[email protected]>
> > > >> >>> > > wrote:
> > > >> >>> > > >>
> > > >> >>> > > >>> Hi, sorry, a busy weekend;
> > > >> >>> > > >>>
> > > >> >>> > > >>> Usually Kylin will request proper number of mapper and
> > > >> >>>reducers;
> > > >> >>> If
> > > >> >>> > you
> > > >> >>> > > >>> see single mapper/recudder, how much of your input and
> > > >>output?
> > > >> >>>If
> > > >> >>> > your
> > > >> >>> > > >>> cube is quite small, single mapper/reducer is possible;
> > > >> >>> > > >>>
> > > >> >>> > > >>> Number of mappers is decided by the FileInputFormat; But
> > > >> >>>number of
> > > >> >>> > > >>>reducer
> > > >> >>> > > >>> was set by Kylin, see:
> > > >> >>> > > >>>
> > > >> >>> > > >>>
> > > >> >>> > > >>>
> > > >> >>> > >
> > > >> >>>
> > > >> >>>
> > > >>
> > >
> https://github.com/apache/incubator-kylin/blob/master/job/src/main/java/
> > > >> >>> > > >>>org
> > > >> >>> > > >>> /apache/kylin/job/hadoop/cube/CuboidJob.java#L141
> > > >> >>> > > >>>
> > > >> >>> > > >>><
> > > >> >>> > >
> > > >> >>>
> > > >>
> > https://github.com/apache/incubator-kylin/blob/master/job/src/main/java
> > > >> >>> > > >>>/org/apache/kylin/job/hadoop/cube/CuboidJob.java#L141>
> > > >> >>> > > >>>
> > > >> >>> > > >>>
> > > >> >>> > > >>>
> > > >> >>> > > >>>
> > > >> >>> > > >>> On 6/14/15, 5:25 PM, "Vineet Mishra"
> > > >><[email protected]>
> > > >> >>> wrote:
> > > >> >>> > > >>>
> > > >> >>> > > >>> >Urgent call, any follow up on this?
> > > >> >>> > > >>> >
> > > >> >>> > > >>> >On Fri, Jun 12, 2015 at 6:46 PM, Vineet Mishra
> > > >> >>> > > >>><[email protected]>
> > > >> >>> > > >>> >wrote:
> > > >> >>> > > >>> >
> > > >> >>> > > >>> >>
> > > >> >>> > > >>> >> Why org.apache.kylin.job.hadoop.cube.CuboidReducer is
> > > >> >>>running
> > > >> >>> > Single
> > > >> >>> > > >>> >> Mapper/Reducer for the job. Can I have the
> > understanding
> > > >> >>>behind
> > > >> >>> > the
> > > >> >>> > > >>> >>reason
> > > >> >>> > > >>> >> of running it as single mapper/reducer.
> > > >> >>> > > >>> >>
> > > >> >>> > > >>> >> Thanks!
> > > >> >>> > > >>> >>
> > > >> >>> > > >>> >> On Fri, Jun 12, 2015 at 6:30 PM, Vineet Mishra
> > > >> >>> > > >>><[email protected]
> > > >> >>> > > >>> >
> > > >> >>> > > >>> >> wrote:
> > > >> >>> > > >>> >>
> > > >> >>> > > >>> >>> Hi All,
> > > >> >>> > > >>> >>>
> > > >> >>> > > >>> >>> I am building a cube using Kylin and I could see
> that
> > > >>the
> > > >> >>>job
> > > >> >>> is
> > > >> >>> > > >>> >>>running
> > > >> >>> > > >>> >>> with Single Mapper and Reducer for some of the
> > > >>intermediate
> > > >> >>> > process
> > > >> >>> > > >>> >>>such as
> > > >> >>> > > >>> >>>
> > > >> >>> > > >>> >>> Extract Fact Table Distinct Columns
> > > >> >>> > > >>> >>> Build Dimension Dictionary
> > > >> >>> > > >>> >>> Build N-Dimension Cuboid
> > > >> >>> > > >>> >>>
> > > >> >>> > > >>> >>> I am not sure what's the reason behind running the
> job
> > > >>with
> > > >> >>> > single
> > > >> >>> > > >>> M/R,
> > > >> >>> > > >>> >>> is it really necessary or is it some default config.
> > > >>which
> > > >> >>> can be
> > > >> >>> > > >>> >>>tweaked,
> > > >> >>> > > >>> >>> its 70 Mins and the job status is 25% !
> > > >> >>> > > >>> >>>
> > > >> >>> > > >>> >>> Urgent Call!
> > > >> >>> > > >>> >>>
> > > >> >>> > > >>> >>> Thanks!
> > > >> >>> > > >>> >>>
> > > >> >>> > > >>> >>
> > > >> >>> > > >>> >>
> > > >> >>> > > >>>
> > > >> >>> > > >>>
> > > >> >>> > > >>
> > > >> >>> > >
> > > >> >>> > >
> > > >> >>> >
> > > >> >>>
> > > >> >>
> > > >> >>
> > > >>
> > > >>
> > >
> > >
> >
>

Re: Jobs Running with Single Mapper/Reducer

Reply via email to