I'm confused; You said these two cubes are exactly same in an early reply,
but now you mentioned the seoncd has more measures... they are NOT the
same; Different measures takes different storage spaces; What kind of the
measures defined in your former cube? Did you use DISTINCT COUNT measure?

2015-06-16 3:28 GMT+08:00 Vineet Mishra <[email protected]>:

> Hi Shi,
>
> Well yes these cubes are build for the same date range without having the
> data changed on the table and moreover hbase compression was not even
> touched so far.
>
> One interesting thing which I got was the later cube which was small in
> size and completing in lesser time was having multiple measures(around 7 or
> 8) in comparison to the former cube having just 1 or perhaps 2 measure with
> more time to complete.
>
> Running the same job again, it came up with some different size for the
> same data, so last time it was reduced 130Mb now its even less, around
> 123Mb. This change is observed within 4 hour of the running the last job
> with the same data set.
>
> I am not sure but how the kylin is internally building these cube dimension
> so as to inflate the data size by 3 and more over when the measures are
> very few as in comparison.
>
> Let me know if you could get anything out of it.
>
> Thanks!
>
> On Mon, Jun 15, 2015 at 8:51 PM, Shi, Shaofeng <[email protected]> wrote:
>
> > Just guess: Are these two cube builds cross the same date range? Was the
> > data on fact table changed between these two builds? Did HBase enable
> > compression recently?
> >
> > You can build once more with the same cube and date range to see whether
> > it can be re-produced; The cube size should be consistent in my
> knowledge;
> >
> > On 6/15/15, 10:52 PM, "Vineet Mishra" <[email protected]> wrote:
> >
> > >Thanks Shi,
> > >
> > >By the way I was curious as for the same table, dimension and metric, I
> > >could see the final cube size varying from 130 Mb to 420Mb.
> > >
> > >The cube is exactly the same as another one but when the cube build job
> is
> > >run I could see the cube size difference as mentioned below,
> > >
> > >*Earlier Cube Build*
> > >Size - 420 Mb
> > >Time Taken for building cube - 140 mins
> > >Source Records - ~11Million
> > >
> > >*Today's Cube Build*
> > >Size - 130 Mb
> > >Time Taken for building cube - 22 mins
> > >Source Records - ~11Million
> > >
> > >I don't find any reason to have three time cube size difference. Any
> ideas
> > >about this?
> > >
> > >Thanks!
> > >
> > >On Mon, Jun 15, 2015 at 8:08 PM, Shi, Shaofeng <[email protected]>
> wrote:
> > >
> > >> Good catch, and thanks for the sharing;
> > >>
> > >> On 6/15/15, 10:20 PM, "Vineet Mishra" <[email protected]> wrote:
> > >>
> > >> >Hi All,
> > >> >
> > >> >Well I got it through.
> > >> >
> > >> >Actually it was basically the pre-process which was converting my 3Gb
> > >>hive
> > >> >table to 60Mb sequence file which due to block size of 128Mb was
> > >>running
> > >> >in
> > >> >a single Mapper job.
> > >> >
> > >> >I changed the split size to 20Mb and it was able to spawn 3 Mapper
> > >>after
> > >> >that.
> > >> >
> > >> >Anyways, thanks all for your quick response.
> > >> >
> > >> >Thanks!
> > >> >
> > >> >On Mon, Jun 15, 2015 at 3:27 PM, Vineet Mishra <
> [email protected]
> > >
> > >> >wrote:
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> Sorry I misunderstood this value, actually 30Mb was the output for
> > >>that
> > >> >> job. But I guess I have the got reason, (not exactly confirmed) as
> > >>why
> > >> >>its
> > >> >> behaving like this, the input to this job is sequence file which is
> > >> >>60Mb in
> > >> >> size and compressed.
> > >> >>
> > >> >> No doubt when it will be passed to the mapper, it is very much
> > >>certain
> > >> >>to
> > >> >> have the single mapper invoked due to the pre-configured block
> size.
> > >> >>Now I
> > >> >> need to check If I can override the splitSize property for the job.
> > >> >>
> > >> >> Moreover I can see that the job is running seamlessly slow, kindly
> > >>find
> > >> >> the stack trace for the job mentioned below.
> > >> >>
> > >> >> 2015-06-13 10:08:00,940 INFO [main]
> org.apache.hadoop.mapred.MapTask:
> > >> >> Processing split: hdfs://
> > >> >>
> > >> >>
> > >>
> > >>
> >
> dev-hadoop-namenode.com:8020/tmp/kylin-9b675c66-4ce9-4a33-a356-2ccf9dbaca
> > >>
> >
> >>>>6a/kylin_intermediate_xyz_20150401000000_20150611000000_9b675c66_4ce9_4
> > >>>>a3
> > >> >>3_a356_2ccf9dbaca6a/000000_0:0+60440271
> > >> >> 2015-06-13 10:08:01,014 INFO [main]
> org.apache.hadoop.mapred.MapTask:
> > >> >> (EQUATOR) 0 kvi 67108860(268435440)
> > >> >> 2015-06-13 10:08:01,015 INFO [main]
> org.apache.hadoop.mapred.MapTask:
> > >> >> mapreduce.task.io.sort.mb: 256
> > >> >> 2015-06-13 10:08:01,015 INFO [main]
> org.apache.hadoop.mapred.MapTask:
> > >> >>soft
> > >> >> limit at 214748368
> > >> >> 2015-06-13 10:08:01,015 INFO [main]
> org.apache.hadoop.mapred.MapTask:
> > >> >> bufstart = 0; bufvoid = 268435456
> > >> >> 2015-06-13 10:08:01,015 INFO [main]
> org.apache.hadoop.mapred.MapTask:
> > >> >> kvstart = 67108860; length = 16777216
> > >> >> 2015-06-13 10:08:01,023 INFO [main]
> org.apache.hadoop.mapred.MapTask:
> > >> >>Map
> > >> >> output collector class =
> > >> >>org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> > >> >> 2015-06-13 10:08:01,048 INFO [main]
> > >> >> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully
> loaded &
> > >> >> initialized native-zlib library
> > >> >> 2015-06-13 10:08:01,048 INFO [main]
> > >> >> org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> > >> >> [.deflate]
> > >> >> 2015-06-13 10:08:01,052 INFO [main]
> > >> >> org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> > >> >> [.deflate]
> > >> >> 2015-06-13 10:08:01,052 INFO [main]
> > >> >> org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> > >> >> [.deflate]
> > >> >> 2015-06-13 10:08:01,053 INFO [main]
> > >> >> org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> > >> >> [.deflate]
> > >> >> 2015-06-13 10:08:01,058 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.AbstractHadoopJob: The absolute path
> for
> > >> >>meta
> > >> >> dir is
> > >> >>
> > >>
> >
> >>>>/yarn/nm/usercache/biops/appcache/application_1433833901375_0139/contai
> > >>>>ne
> > >> >>r_1433833901375_0139_01_000002/meta
> > >> >> 2015-06-13 10:08:01,059 INFO [main]
> > >>org.apache.kylin.common.KylinConfig:
> > >> >> Use
> > >> >>
> > >>
> >
> >>>>KYLIN_CONF=/yarn/nm/usercache/biops/appcache/application_1433833901375_
> > >>>>01
> > >> >>39/container_1433833901375_0139_01_000002/meta
> > >> >> 2015-06-13 10:08:01,084 INFO [main]
> > >>org.apache.kylin.cube.CubeManager:
> > >> >> Initializing CubeManager with config
> > >> >> /yarn/nm/usercache/biops/filecache/19452/meta
> > >> >> 2015-06-13 10:08:01,086 INFO [main]
> > >> >> org.apache.kylin.common.persistence.ResourceStore: Using metadata
> url
> > >> >> /yarn/nm/usercache/biops/filecache/19452/meta for resource store
> > >> >> 2015-06-13 10:08:01,327 INFO [main]
> > >> >>org.apache.kylin.cube.CubeDescManager:
> > >> >> Initializing CubeDescManager with config
> > >> >> /yarn/nm/usercache/biops/filecache/19452/meta
> > >> >> 2015-06-13 10:08:01,327 INFO [main]
> > >> >>org.apache.kylin.cube.CubeDescManager:
> > >> >> Reloading Cube Metadata from folder
> > >> >> /yarn/nm/usercache/biops/filecache/19452/meta/cube_desc
> > >> >> 2015-06-13 10:08:22,834 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 100000
> > >> >>records!
> > >> >> 2015-06-13 10:08:43,225 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 200000
> > >> >>records!
> > >> >> 2015-06-13 10:09:03,622 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 300000
> > >> >>records!
> > >> >> 2015-06-13 10:09:23,999 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 400000
> > >> >>records!
> > >> >> 2015-06-13 10:09:44,372 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 500000
> > >> >>records!
> > >> >> 2015-06-13 10:10:04,780 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 600000
> > >> >>records!
> > >> >> 2015-06-13 10:10:25,143 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 700000
> > >> >>records!
> > >> >> 2015-06-13 10:10:45,512 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 800000
> > >> >>records!
> > >> >> 2015-06-13 10:11:05,895 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 900000
> > >> >>records!
> > >> >> 2015-06-13 10:11:26,284 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1000000
> > >> >>records!
> > >> >> 2015-06-13 10:11:46,716 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1100000
> > >> >>records!
> > >> >> 2015-06-13 10:12:07,174 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1200000
> > >> >>records!
> > >> >> 2015-06-13 10:12:27,646 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1300000
> > >> >>records!
> > >> >> 2015-06-13 10:12:48,127 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1400000
> > >> >>records!
> > >> >> 2015-06-13 10:13:08,614 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1500000
> > >> >>records!
> > >> >> 2015-06-13 10:13:29,148 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1600000
> > >> >>records!
> > >> >> 2015-06-13 10:13:49,686 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1700000
> > >> >>records!
> > >> >> 2015-06-13 10:14:10,167 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1800000
> > >> >>records!
> > >> >> 2015-06-13 10:14:30,652 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1900000
> > >> >>records!
> > >> >> 2015-06-13 10:14:51,199 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2000000
> > >> >>records!
> > >> >> 2015-06-13 10:15:11,719 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2100000
> > >> >>records!
> > >> >> 2015-06-13 10:15:32,221 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2200000
> > >> >>records!
> > >> >> 2015-06-13 10:15:52,717 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2300000
> > >> >>records!
> > >> >> 2015-06-13 10:16:13,252 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2400000
> > >> >>records!
> > >> >> 2015-06-13 10:16:33,769 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2500000
> > >> >>records!
> > >> >> 2015-06-13 10:16:54,263 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2600000
> > >> >>records!
> > >> >> 2015-06-13 10:17:14,741 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2700000
> > >> >>records!
> > >> >> 2015-06-13 10:17:35,213 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2800000
> > >> >>records!
> > >> >> 2015-06-13 10:17:55,699 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2900000
> > >> >>records!
> > >> >> 2015-06-13 10:18:16,204 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3000000
> > >> >>records!
> > >> >> 2015-06-13 10:18:36,721 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3100000
> > >> >>records!
> > >> >> 2015-06-13 10:18:57,249 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3200000
> > >> >>records!
> > >> >> 2015-06-13 10:19:17,743 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3300000
> > >> >>records!
> > >> >> 2015-06-13 10:19:38,275 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3400000
> > >> >>records!
> > >> >> 2015-06-13 10:19:58,812 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3500000
> > >> >>records!
> > >> >> 2015-06-13 10:20:19,312 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3600000
> > >> >>records!
> > >> >> 2015-06-13 10:20:39,920 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3700000
> > >> >>records!
> > >> >> 2015-06-13 10:21:00,518 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3800000
> > >> >>records!
> > >> >> 2015-06-13 10:21:21,008 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3900000
> > >> >>records!
> > >> >> 2015-06-13 10:21:41,525 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4000000
> > >> >>records!
> > >> >> 2015-06-13 10:22:02,023 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4100000
> > >> >>records!
> > >> >> 2015-06-13 10:22:22,534 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4200000
> > >> >>records!
> > >> >> 2015-06-13 10:22:43,063 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4300000
> > >> >>records!
> > >> >> 2015-06-13 10:23:03,558 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4400000
> > >> >>records!
> > >> >> 2015-06-13 10:23:24,043 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4500000
> > >> >>records!
> > >> >> 2015-06-13 10:23:44,537 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4600000
> > >> >>records!
> > >> >> 2015-06-13 10:24:05,015 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4700000
> > >> >>records!
> > >> >> 2015-06-13 10:24:25,490 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4800000
> > >> >>records!
> > >> >> 2015-06-13 10:24:45,995 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4900000
> > >> >>records!
> > >> >> 2015-06-13 10:25:06,487 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5000000
> > >> >>records!
> > >> >> 2015-06-13 10:25:26,964 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5100000
> > >> >>records!
> > >> >> 2015-06-13 10:25:47,524 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5200000
> > >> >>records!
> > >> >> 2015-06-13 10:26:08,032 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5300000
> > >> >>records!
> > >> >> 2015-06-13 10:26:28,514 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5400000
> > >> >>records!
> > >> >> 2015-06-13 10:26:49,053 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5500000
> > >> >>records!
> > >> >> 2015-06-13 10:26:50,360 INFO [main]
> org.apache.hadoop.mapred.MapTask:
> > >> >> Spilling map output
> > >> >> 2015-06-13 10:26:50,360 INFO [main]
> org.apache.hadoop.mapred.MapTask:
> > >> >> bufstart = 0; bufend = 126646464; bufvoid = 268435456
> > >> >> 2015-06-13 10:26:50,360 INFO [main]
> org.apache.hadoop.mapred.MapTask:
> > >> >> kvstart = 67108860(268435440); kvend = 45083392(180333568); length
> =
> > >> >> 22025469/16777216
> > >> >> 2015-06-13 10:26:50,360 INFO [main]
> org.apache.hadoop.mapred.MapTask:
> > >> >> (EQUATOR) 148671936 kvi 37167980(148671920)
> > >> >> 2015-06-13 10:26:54,802 INFO [SpillThread]
> > >> >> org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor
> > >> >>[.snappy]
> > >> >> 2015-06-13 10:26:54,883 INFO [SpillThread]
> > >> >> org.apache.kylin.job.hadoop.AbstractHadoopJob: The absolute path
> for
> > >> >>meta
> > >> >> dir is
> > >> >>
> > >>
> >
> >>>>/yarn/nm/usercache/biops/appcache/application_1433833901375_0139/contai
> > >>>>ne
> > >> >>r_1433833901375_0139_01_000002/meta
> > >> >> 2015-06-13 10:27:09,614 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5600000
> > >> >>records!
> > >> >> 2015-06-13 10:27:30,135 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5700000
> > >> >>records!
> > >> >> 2015-06-13 10:27:50,657 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5800000
> > >> >>records!
> > >> >> 2015-06-13 10:27:59,691 INFO [SpillThread]
> > >> >> org.apache.kylin.job.hadoop.cube.CuboidReducer: Handled 100000
> > >>records!
> > >> >> 2015-06-13 10:28:11,185 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5900000
> > >> >>records!
> > >> >> 2015-06-13 10:28:31,751 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6000000
> > >> >>records!
> > >> >> 2015-06-13 10:28:52,303 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6100000
> > >> >>records!
> > >> >> 2015-06-13 10:29:03,235 INFO [SpillThread]
> > >> >> org.apache.kylin.job.hadoop.cube.CuboidReducer: Handled 200000
> > >>records!
> > >> >> 2015-06-13 10:29:12,873 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6200000
> > >> >>records!
> > >> >> 2015-06-13 10:29:33,424 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6300000
> > >> >>records!
> > >> >> 2015-06-13 10:29:54,168 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6400000
> > >> >>records!
> > >> >> 2015-06-13 10:30:14,771 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6500000
> > >> >>records!
> > >> >> 2015-06-13 10:30:18,479 INFO [SpillThread]
> > >> >> org.apache.kylin.job.hadoop.cube.CuboidReducer: Handled 300000
> > >>records!
> > >> >> 2015-06-13 10:30:35,325 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6600000
> > >> >>records!
> > >> >> 2015-06-13 10:30:39,831 INFO [SpillThread]
> > >> >> org.apache.hadoop.mapred.MapTask: Finished spill 0
> > >> >> 2015-06-13 10:30:39,831 INFO [main]
> org.apache.hadoop.mapred.MapTask:
> > >> >> (RESET) equator 148671936 kv 37167980(148671920) kvi
> > >>32705668(130822672)
> > >> >> 2015-06-13 10:30:55,807 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6700000
> > >> >>records!
> > >> >> 2015-06-13 10:31:16,301 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6800000
> > >> >>records!
> > >> >> 2015-06-13 10:31:36,804 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6900000
> > >> >>records!
> > >> >> 2015-06-13 10:31:57,302 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7000000
> > >> >>records!
> > >> >> 2015-06-13 10:32:17,790 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7100000
> > >> >>records!
> > >> >> 2015-06-13 10:32:38,294 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7200000
> > >> >>records!
> > >> >> 2015-06-13 10:32:58,830 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7300000
> > >> >>records!
> > >> >> 2015-06-13 10:33:19,354 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7400000
> > >> >>records!
> > >> >> 2015-06-13 10:33:39,866 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7500000
> > >> >>records!
> > >> >> 2015-06-13 10:34:00,371 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7600000
> > >> >>records!
> > >> >> 2015-06-13 10:34:20,872 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7700000
> > >> >>records!
> > >> >> 2015-06-13 10:34:41,387 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7800000
> > >> >>records!
> > >> >> 2015-06-13 10:35:01,884 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7900000
> > >> >>records!
> > >> >> 2015-06-13 10:35:22,368 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8000000
> > >> >>records!
> > >> >> 2015-06-13 10:35:42,868 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8100000
> > >> >>records!
> > >> >> 2015-06-13 10:36:03,361 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8200000
> > >> >>records!
> > >> >> 2015-06-13 10:36:23,981 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8300000
> > >> >>records!
> > >> >> 2015-06-13 10:36:44,529 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8400000
> > >> >>records!
> > >> >> 2015-06-13 10:37:05,034 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8500000
> > >> >>records!
> > >> >> 2015-06-13 10:37:25,609 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8600000
> > >> >>records!
> > >> >> 2015-06-13 10:37:46,152 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8700000
> > >> >>records!
> > >> >> 2015-06-13 10:38:06,725 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8800000
> > >> >>records!
> > >> >> 2015-06-13 10:38:27,296 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8900000
> > >> >>records!
> > >> >> 2015-06-13 10:38:47,905 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9000000
> > >> >>records!
> > >> >> 2015-06-13 10:39:08,430 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9100000
> > >> >>records!
> > >> >> 2015-06-13 10:39:28,952 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9200000
> > >> >>records!
> > >> >> 2015-06-13 10:39:49,460 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9300000
> > >> >>records!
> > >> >> 2015-06-13 10:40:09,987 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9400000
> > >> >>records!
> > >> >> 2015-06-13 10:40:30,551 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9500000
> > >> >>records!
> > >> >> 2015-06-13 10:40:51,080 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9600000
> > >> >>records!
> > >> >> 2015-06-13 10:41:11,579 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9700000
> > >> >>records!
> > >> >> 2015-06-13 10:41:32,122 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9800000
> > >> >>records!
> > >> >> 2015-06-13 10:41:52,623 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9900000
> > >> >>records!
> > >> >> 2015-06-13 10:42:13,128 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10000000
> > >> >>records!
> > >> >> 2015-06-13 10:42:33,628 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10100000
> > >> >>records!
> > >> >> 2015-06-13 10:42:54,128 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10200000
> > >> >>records!
> > >> >> 2015-06-13 10:43:14,629 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10300000
> > >> >>records!
> > >> >> 2015-06-13 10:43:35,149 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10400000
> > >> >>records!
> > >> >> 2015-06-13 10:43:55,764 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10500000
> > >> >>records!
> > >> >> 2015-06-13 10:44:16,303 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10600000
> > >> >>records!
> > >> >> 2015-06-13 10:44:36,857 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10700000
> > >> >>records!
> > >> >> 2015-06-13 10:44:57,349 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10800000
> > >> >>records!
> > >> >> 2015-06-13 10:45:17,840 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10900000
> > >> >>records!
> > >> >> 2015-06-13 10:45:38,441 INFO [main]
> > >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 11000000
> > >> >>records!
> > >> >> 2015-06-13 10:45:41,057 INFO [main]
> org.apache.hadoop.mapred.MapTask:
> > >> >> Spilling map output
> > >> >> 2015-06-13 10:45:41,057 INFO [main]
> org.apache.hadoop.mapred.MapTask:
> > >> >> bufstart = 148671936; bufend = 6882957; bufvoid = 268435443
> > >> >> 2015-06-13 10:45:41,057 INFO [main]
> org.apache.hadoop.mapred.MapTask:
> > >> >> kvstart = 37167980(148671920); kvend = 15142512(60570048); length =
> > >> >> 22025469/16777216
> > >> >> 2015-06-13 10:45:41,057 INFO [main]
> org.apache.hadoop.mapred.MapTask:
> > >> >> (EQUATOR) 28908413 kvi 7227096(28908384)
> > >> >>
> > >> >> Thanks,
> > >> >>
> > >> >> On Mon, Jun 15, 2015 at 3:11 PM, 周千昊 <[email protected]> wrote:
> > >> >>
> > >> >>> Hi Mishra
> > >> >>>      It is as what you described, since the data size is 30MB
> only,
> > >> >>>hadoop
> > >> >>> will only manage to run MR in single mapper and reduce.
> > >> >>>      As for your question about why it takes so long to run on a
> > >>small
> > >> >>> dataset, can you please dig into the web page of *map reduce task
> > >> >>>status
> > >> >>> *to
> > >> >>> check how much time the MR job really take, so that we can make
> > >>sure if
> > >> >>> the
> > >> >>> time is consuming on the MR job or kylin job scheduling module
> > >> >>>
> > >> >>> Vineet Mishra <[email protected]>于2015年6月15日周一 下午4:51写道:
> > >> >>>
> > >> >>> > Shi,
> > >> >>> >
> > >> >>> > Hadoop is setup correctly on my cluster with the default block
> > >>size
> > >> >>>of
> > >> >>> > 128Mb and its indeed very much running multiple Mapper/Reducer
> > >>based
> > >> >>> jobs
> > >> >>> > for other cases.
> > >> >>> >
> > >> >>> > Its the only Kylin Cube building which is running through single
> > >>M/R
> > >> >>> job.
> > >> >>> >
> > >> >>> > Moreover to my surprise, the 4th running job which is Build Base
> > >> >>>Cuboid
> > >> >>> > Data shows Data size as 30Mb, is it the reason due to which
> single
> > >> >>> mapper
> > >> >>> > is getting invoked, if that being the case then even to process
> > >>such
> > >> >>>a
> > >> >>> > small data set why is it taking around 50min.
> > >> >>> >
> > >> >>> > Thanks,
> > >> >>> >
> > >> >>> > On Mon, Jun 15, 2015 at 11:17 AM, Shi, Shaofeng <
> [email protected]
> > >
> > >> >>> wrote:
> > >> >>> >
> > >> >>> > > Yes you can adjust these parameters, for example give a
> smaller
> > >> >>>value
> > >> >>> for
> > >> >>> > > kylin.job.mapreduce.default.reduce.input.mb; but it only
> affects
> > >> >>>the
> > >> >>> > > reducer number;
> > >> >>> > >
> > >> >>> > > I suggest you investigate why there is only 1 mapper be
> started;
> > >> >>>Some
> > >> >>> > > factors like hadoop cluster size, HDFS file block size will
> > >>impact
> > >> >>> this;
> > >> >>> > > You can run a SQL (a query which need run MR, not simple
> select
> > >>*)
> > >> >>> with
> > >> >>> > > hive -e, and then use the MR job track URL to see how many
> > >>mappers
> > >> >>>be
> > >> >>> > > triggered; If it is still single, then the problem is in your
> > >> >>>hadoop
> > >> >>> > > configuration; Otherwise it may exists in Kylin, check if you
> > >>put
> > >> >>>some
> > >> >>> > > additional parameter in conf/kylin_job_conf.xml.
> > >> >>> > >
> > >> >>> > >
> > >> >>> > > On 6/15/15, 2:52 AM, "Vineet Mishra" <[email protected]>
> > >> >>>wrote:
> > >> >>> > >
> > >> >>> > > >Can I have specification for these properties?
> > >> >>> > > >
> > >> >>> > > >KYLIN_JOB_MAPREDUCE_DEFAULT_REDUCE_COUNT_RATIO =
> > >> >>> > > >"kylin.job.mapreduce.default.reduce.count.ratio";
> > >> >>> > > >KYLIN_JOB_MAPREDUCE_DEFAULT_REDUCE_INPUT_MB =
> > >> >>> > > >"kylin.job.mapreduce.default.reduce.input.mb";
> > >> >>> > > >KYLIN_JOB_MAPREDUCE_MAX_REDUCER_NUMBER =
> > >> >>> > > >"kylin.job.mapreduce.max.reducer.number";
> > >> >>> > > >
> > >> >>> > > >Thanks!
> > >> >>> > > >
> > >> >>> > > >On Sun, Jun 14, 2015 at 11:59 PM, Vineet Mishra <
> > >> >>> [email protected]
> > >> >>> > >
> > >> >>> > > >wrote:
> > >> >>> > > >
> > >> >>> > > >> Hi Shi,
> > >> >>> > > >>
> > >> >>> > > >> Its alright!
> > >> >>> > > >> So I was wondering my source hive Table is around 3 Gb,
> > >>despite
> > >> >>>of
> > >> >>> my
> > >> >>> > > >>hive
> > >> >>> > > >> table being partitioned and holding the data around 50-70
> Mb
> > >>per
> > >> >>> > > >>partition
> > >> >>> > > >> the Mapper and Reducer getting spawned are single. The
> > >>amount of
> > >> >>> data
> > >> >>> > > >>that
> > >> >>> > > >> is being processed in the M/R is nothing as expected but it
> > >> >>>takes
> > >> >>> hell
> > >> >>> > > >>lot
> > >> >>> > > >> of time.
> > >> >>> > > >>
> > >> >>> > > >> As mentioned in the trailing mail that the job is getting
> > >>very
> > >> >>> slow,
> > >> >>> > the
> > >> >>> > > >> process Build Base Cuboid Data itself takes around 50mins
> to
> > >>get
> > >> >>> > > >> completed.
> > >> >>> > > >>
> > >> >>> > > >> I can tweak the reducer parameter mentioned by you but do u
> > >> >>>think
> > >> >>> that
> > >> >>> > > >> will make a difference since the mapper is where the most
> of
> > >>the
> > >> >>> time
> > >> >>> > is
> > >> >>> > > >> spent.
> > >> >>> > > >>
> > >> >>> > > >> Can you share your thoughts for performance tuning for the
> > >>cube
> > >> >>> build!
> > >> >>> > > >>
> > >> >>> > > >> Thanks!
> > >> >>> > > >>
> > >> >>> > > >> On Sun, Jun 14, 2015 at 7:26 PM, Shi, Shaofeng
> > >> >>><[email protected]>
> > >> >>> > > wrote:
> > >> >>> > > >>
> > >> >>> > > >>> Hi, sorry, a busy weekend;
> > >> >>> > > >>>
> > >> >>> > > >>> Usually Kylin will request proper number of mapper and
> > >> >>>reducers;
> > >> >>> If
> > >> >>> > you
> > >> >>> > > >>> see single mapper/recudder, how much of your input and
> > >>output?
> > >> >>>If
> > >> >>> > your
> > >> >>> > > >>> cube is quite small, single mapper/reducer is possible;
> > >> >>> > > >>>
> > >> >>> > > >>> Number of mappers is decided by the FileInputFormat; But
> > >> >>>number of
> > >> >>> > > >>>reducer
> > >> >>> > > >>> was set by Kylin, see:
> > >> >>> > > >>>
> > >> >>> > > >>>
> > >> >>> > > >>>
> > >> >>> > >
> > >> >>>
> > >> >>>
> > >>
> > https://github.com/apache/incubator-kylin/blob/master/job/src/main/java/
> > >> >>> > > >>>org
> > >> >>> > > >>> /apache/kylin/job/hadoop/cube/CuboidJob.java#L141
> > >> >>> > > >>>
> > >> >>> > > >>><
> > >> >>> > >
> > >> >>>
> > >>
> https://github.com/apache/incubator-kylin/blob/master/job/src/main/java
> > >> >>> > > >>>/org/apache/kylin/job/hadoop/cube/CuboidJob.java#L141>
> > >> >>> > > >>>
> > >> >>> > > >>>
> > >> >>> > > >>>
> > >> >>> > > >>>
> > >> >>> > > >>> On 6/14/15, 5:25 PM, "Vineet Mishra"
> > >><[email protected]>
> > >> >>> wrote:
> > >> >>> > > >>>
> > >> >>> > > >>> >Urgent call, any follow up on this?
> > >> >>> > > >>> >
> > >> >>> > > >>> >On Fri, Jun 12, 2015 at 6:46 PM, Vineet Mishra
> > >> >>> > > >>><[email protected]>
> > >> >>> > > >>> >wrote:
> > >> >>> > > >>> >
> > >> >>> > > >>> >>
> > >> >>> > > >>> >> Why org.apache.kylin.job.hadoop.cube.CuboidReducer is
> > >> >>>running
> > >> >>> > Single
> > >> >>> > > >>> >> Mapper/Reducer for the job. Can I have the
> understanding
> > >> >>>behind
> > >> >>> > the
> > >> >>> > > >>> >>reason
> > >> >>> > > >>> >> of running it as single mapper/reducer.
> > >> >>> > > >>> >>
> > >> >>> > > >>> >> Thanks!
> > >> >>> > > >>> >>
> > >> >>> > > >>> >> On Fri, Jun 12, 2015 at 6:30 PM, Vineet Mishra
> > >> >>> > > >>><[email protected]
> > >> >>> > > >>> >
> > >> >>> > > >>> >> wrote:
> > >> >>> > > >>> >>
> > >> >>> > > >>> >>> Hi All,
> > >> >>> > > >>> >>>
> > >> >>> > > >>> >>> I am building a cube using Kylin and I could see that
> > >>the
> > >> >>>job
> > >> >>> is
> > >> >>> > > >>> >>>running
> > >> >>> > > >>> >>> with Single Mapper and Reducer for some of the
> > >>intermediate
> > >> >>> > process
> > >> >>> > > >>> >>>such as
> > >> >>> > > >>> >>>
> > >> >>> > > >>> >>> Extract Fact Table Distinct Columns
> > >> >>> > > >>> >>> Build Dimension Dictionary
> > >> >>> > > >>> >>> Build N-Dimension Cuboid
> > >> >>> > > >>> >>>
> > >> >>> > > >>> >>> I am not sure what's the reason behind running the job
> > >>with
> > >> >>> > single
> > >> >>> > > >>> M/R,
> > >> >>> > > >>> >>> is it really necessary or is it some default config.
> > >>which
> > >> >>> can be
> > >> >>> > > >>> >>>tweaked,
> > >> >>> > > >>> >>> its 70 Mins and the job status is 25% !
> > >> >>> > > >>> >>>
> > >> >>> > > >>> >>> Urgent Call!
> > >> >>> > > >>> >>>
> > >> >>> > > >>> >>> Thanks!
> > >> >>> > > >>> >>>
> > >> >>> > > >>> >>
> > >> >>> > > >>> >>
> > >> >>> > > >>>
> > >> >>> > > >>>
> > >> >>> > > >>
> > >> >>> > >
> > >> >>> > >
> > >> >>> >
> > >> >>>
> > >> >>
> > >> >>
> > >>
> > >>
> >
> >
>

Reply via email to