Hi Shi,

Well yes these cubes are build for the same date range without having the
data changed on the table and moreover hbase compression was not even
touched so far.

One interesting thing which I got was the later cube which was small in
size and completing in lesser time was having multiple measures(around 7 or
8) in comparison to the former cube having just 1 or perhaps 2 measure with
more time to complete.

Running the same job again, it came up with some different size for the
same data, so last time it was reduced 130Mb now its even less, around
123Mb. This change is observed within 4 hour of the running the last job
with the same data set.

I am not sure but how the kylin is internally building these cube dimension
so as to inflate the data size by 3 and more over when the measures are
very few as in comparison.

Let me know if you could get anything out of it.

Thanks!

On Mon, Jun 15, 2015 at 8:51 PM, Shi, Shaofeng <[email protected]> wrote:

> Just guess: Are these two cube builds cross the same date range? Was the
> data on fact table changed between these two builds? Did HBase enable
> compression recently?
>
> You can build once more with the same cube and date range to see whether
> it can be re-produced; The cube size should be consistent in my knowledge;
>
> On 6/15/15, 10:52 PM, "Vineet Mishra" <[email protected]> wrote:
>
> >Thanks Shi,
> >
> >By the way I was curious as for the same table, dimension and metric, I
> >could see the final cube size varying from 130 Mb to 420Mb.
> >
> >The cube is exactly the same as another one but when the cube build job is
> >run I could see the cube size difference as mentioned below,
> >
> >*Earlier Cube Build*
> >Size - 420 Mb
> >Time Taken for building cube - 140 mins
> >Source Records - ~11Million
> >
> >*Today's Cube Build*
> >Size - 130 Mb
> >Time Taken for building cube - 22 mins
> >Source Records - ~11Million
> >
> >I don't find any reason to have three time cube size difference. Any ideas
> >about this?
> >
> >Thanks!
> >
> >On Mon, Jun 15, 2015 at 8:08 PM, Shi, Shaofeng <[email protected]> wrote:
> >
> >> Good catch, and thanks for the sharing;
> >>
> >> On 6/15/15, 10:20 PM, "Vineet Mishra" <[email protected]> wrote:
> >>
> >> >Hi All,
> >> >
> >> >Well I got it through.
> >> >
> >> >Actually it was basically the pre-process which was converting my 3Gb
> >>hive
> >> >table to 60Mb sequence file which due to block size of 128Mb was
> >>running
> >> >in
> >> >a single Mapper job.
> >> >
> >> >I changed the split size to 20Mb and it was able to spawn 3 Mapper
> >>after
> >> >that.
> >> >
> >> >Anyways, thanks all for your quick response.
> >> >
> >> >Thanks!
> >> >
> >> >On Mon, Jun 15, 2015 at 3:27 PM, Vineet Mishra <[email protected]
> >
> >> >wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> Sorry I misunderstood this value, actually 30Mb was the output for
> >>that
> >> >> job. But I guess I have the got reason, (not exactly confirmed) as
> >>why
> >> >>its
> >> >> behaving like this, the input to this job is sequence file which is
> >> >>60Mb in
> >> >> size and compressed.
> >> >>
> >> >> No doubt when it will be passed to the mapper, it is very much
> >>certain
> >> >>to
> >> >> have the single mapper invoked due to the pre-configured block size.
> >> >>Now I
> >> >> need to check If I can override the splitSize property for the job.
> >> >>
> >> >> Moreover I can see that the job is running seamlessly slow, kindly
> >>find
> >> >> the stack trace for the job mentioned below.
> >> >>
> >> >> 2015-06-13 10:08:00,940 INFO [main] org.apache.hadoop.mapred.MapTask:
> >> >> Processing split: hdfs://
> >> >>
> >> >>
> >>
> >>
> dev-hadoop-namenode.com:8020/tmp/kylin-9b675c66-4ce9-4a33-a356-2ccf9dbaca
> >>
> >>>>6a/kylin_intermediate_xyz_20150401000000_20150611000000_9b675c66_4ce9_4
> >>>>a3
> >> >>3_a356_2ccf9dbaca6a/000000_0:0+60440271
> >> >> 2015-06-13 10:08:01,014 INFO [main] org.apache.hadoop.mapred.MapTask:
> >> >> (EQUATOR) 0 kvi 67108860(268435440)
> >> >> 2015-06-13 10:08:01,015 INFO [main] org.apache.hadoop.mapred.MapTask:
> >> >> mapreduce.task.io.sort.mb: 256
> >> >> 2015-06-13 10:08:01,015 INFO [main] org.apache.hadoop.mapred.MapTask:
> >> >>soft
> >> >> limit at 214748368
> >> >> 2015-06-13 10:08:01,015 INFO [main] org.apache.hadoop.mapred.MapTask:
> >> >> bufstart = 0; bufvoid = 268435456
> >> >> 2015-06-13 10:08:01,015 INFO [main] org.apache.hadoop.mapred.MapTask:
> >> >> kvstart = 67108860; length = 16777216
> >> >> 2015-06-13 10:08:01,023 INFO [main] org.apache.hadoop.mapred.MapTask:
> >> >>Map
> >> >> output collector class =
> >> >>org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> >> >> 2015-06-13 10:08:01,048 INFO [main]
> >> >> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
> >> >> initialized native-zlib library
> >> >> 2015-06-13 10:08:01,048 INFO [main]
> >> >> org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> >> >> [.deflate]
> >> >> 2015-06-13 10:08:01,052 INFO [main]
> >> >> org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> >> >> [.deflate]
> >> >> 2015-06-13 10:08:01,052 INFO [main]
> >> >> org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> >> >> [.deflate]
> >> >> 2015-06-13 10:08:01,053 INFO [main]
> >> >> org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> >> >> [.deflate]
> >> >> 2015-06-13 10:08:01,058 INFO [main]
> >> >> org.apache.kylin.job.hadoop.AbstractHadoopJob: The absolute path for
> >> >>meta
> >> >> dir is
> >> >>
> >>
> >>>>/yarn/nm/usercache/biops/appcache/application_1433833901375_0139/contai
> >>>>ne
> >> >>r_1433833901375_0139_01_000002/meta
> >> >> 2015-06-13 10:08:01,059 INFO [main]
> >>org.apache.kylin.common.KylinConfig:
> >> >> Use
> >> >>
> >>
> >>>>KYLIN_CONF=/yarn/nm/usercache/biops/appcache/application_1433833901375_
> >>>>01
> >> >>39/container_1433833901375_0139_01_000002/meta
> >> >> 2015-06-13 10:08:01,084 INFO [main]
> >>org.apache.kylin.cube.CubeManager:
> >> >> Initializing CubeManager with config
> >> >> /yarn/nm/usercache/biops/filecache/19452/meta
> >> >> 2015-06-13 10:08:01,086 INFO [main]
> >> >> org.apache.kylin.common.persistence.ResourceStore: Using metadata url
> >> >> /yarn/nm/usercache/biops/filecache/19452/meta for resource store
> >> >> 2015-06-13 10:08:01,327 INFO [main]
> >> >>org.apache.kylin.cube.CubeDescManager:
> >> >> Initializing CubeDescManager with config
> >> >> /yarn/nm/usercache/biops/filecache/19452/meta
> >> >> 2015-06-13 10:08:01,327 INFO [main]
> >> >>org.apache.kylin.cube.CubeDescManager:
> >> >> Reloading Cube Metadata from folder
> >> >> /yarn/nm/usercache/biops/filecache/19452/meta/cube_desc
> >> >> 2015-06-13 10:08:22,834 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 100000
> >> >>records!
> >> >> 2015-06-13 10:08:43,225 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 200000
> >> >>records!
> >> >> 2015-06-13 10:09:03,622 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 300000
> >> >>records!
> >> >> 2015-06-13 10:09:23,999 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 400000
> >> >>records!
> >> >> 2015-06-13 10:09:44,372 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 500000
> >> >>records!
> >> >> 2015-06-13 10:10:04,780 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 600000
> >> >>records!
> >> >> 2015-06-13 10:10:25,143 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 700000
> >> >>records!
> >> >> 2015-06-13 10:10:45,512 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 800000
> >> >>records!
> >> >> 2015-06-13 10:11:05,895 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 900000
> >> >>records!
> >> >> 2015-06-13 10:11:26,284 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1000000
> >> >>records!
> >> >> 2015-06-13 10:11:46,716 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1100000
> >> >>records!
> >> >> 2015-06-13 10:12:07,174 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1200000
> >> >>records!
> >> >> 2015-06-13 10:12:27,646 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1300000
> >> >>records!
> >> >> 2015-06-13 10:12:48,127 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1400000
> >> >>records!
> >> >> 2015-06-13 10:13:08,614 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1500000
> >> >>records!
> >> >> 2015-06-13 10:13:29,148 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1600000
> >> >>records!
> >> >> 2015-06-13 10:13:49,686 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1700000
> >> >>records!
> >> >> 2015-06-13 10:14:10,167 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1800000
> >> >>records!
> >> >> 2015-06-13 10:14:30,652 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1900000
> >> >>records!
> >> >> 2015-06-13 10:14:51,199 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2000000
> >> >>records!
> >> >> 2015-06-13 10:15:11,719 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2100000
> >> >>records!
> >> >> 2015-06-13 10:15:32,221 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2200000
> >> >>records!
> >> >> 2015-06-13 10:15:52,717 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2300000
> >> >>records!
> >> >> 2015-06-13 10:16:13,252 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2400000
> >> >>records!
> >> >> 2015-06-13 10:16:33,769 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2500000
> >> >>records!
> >> >> 2015-06-13 10:16:54,263 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2600000
> >> >>records!
> >> >> 2015-06-13 10:17:14,741 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2700000
> >> >>records!
> >> >> 2015-06-13 10:17:35,213 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2800000
> >> >>records!
> >> >> 2015-06-13 10:17:55,699 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2900000
> >> >>records!
> >> >> 2015-06-13 10:18:16,204 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3000000
> >> >>records!
> >> >> 2015-06-13 10:18:36,721 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3100000
> >> >>records!
> >> >> 2015-06-13 10:18:57,249 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3200000
> >> >>records!
> >> >> 2015-06-13 10:19:17,743 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3300000
> >> >>records!
> >> >> 2015-06-13 10:19:38,275 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3400000
> >> >>records!
> >> >> 2015-06-13 10:19:58,812 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3500000
> >> >>records!
> >> >> 2015-06-13 10:20:19,312 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3600000
> >> >>records!
> >> >> 2015-06-13 10:20:39,920 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3700000
> >> >>records!
> >> >> 2015-06-13 10:21:00,518 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3800000
> >> >>records!
> >> >> 2015-06-13 10:21:21,008 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3900000
> >> >>records!
> >> >> 2015-06-13 10:21:41,525 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4000000
> >> >>records!
> >> >> 2015-06-13 10:22:02,023 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4100000
> >> >>records!
> >> >> 2015-06-13 10:22:22,534 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4200000
> >> >>records!
> >> >> 2015-06-13 10:22:43,063 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4300000
> >> >>records!
> >> >> 2015-06-13 10:23:03,558 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4400000
> >> >>records!
> >> >> 2015-06-13 10:23:24,043 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4500000
> >> >>records!
> >> >> 2015-06-13 10:23:44,537 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4600000
> >> >>records!
> >> >> 2015-06-13 10:24:05,015 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4700000
> >> >>records!
> >> >> 2015-06-13 10:24:25,490 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4800000
> >> >>records!
> >> >> 2015-06-13 10:24:45,995 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4900000
> >> >>records!
> >> >> 2015-06-13 10:25:06,487 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5000000
> >> >>records!
> >> >> 2015-06-13 10:25:26,964 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5100000
> >> >>records!
> >> >> 2015-06-13 10:25:47,524 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5200000
> >> >>records!
> >> >> 2015-06-13 10:26:08,032 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5300000
> >> >>records!
> >> >> 2015-06-13 10:26:28,514 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5400000
> >> >>records!
> >> >> 2015-06-13 10:26:49,053 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5500000
> >> >>records!
> >> >> 2015-06-13 10:26:50,360 INFO [main] org.apache.hadoop.mapred.MapTask:
> >> >> Spilling map output
> >> >> 2015-06-13 10:26:50,360 INFO [main] org.apache.hadoop.mapred.MapTask:
> >> >> bufstart = 0; bufend = 126646464; bufvoid = 268435456
> >> >> 2015-06-13 10:26:50,360 INFO [main] org.apache.hadoop.mapred.MapTask:
> >> >> kvstart = 67108860(268435440); kvend = 45083392(180333568); length =
> >> >> 22025469/16777216
> >> >> 2015-06-13 10:26:50,360 INFO [main] org.apache.hadoop.mapred.MapTask:
> >> >> (EQUATOR) 148671936 kvi 37167980(148671920)
> >> >> 2015-06-13 10:26:54,802 INFO [SpillThread]
> >> >> org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor
> >> >>[.snappy]
> >> >> 2015-06-13 10:26:54,883 INFO [SpillThread]
> >> >> org.apache.kylin.job.hadoop.AbstractHadoopJob: The absolute path for
> >> >>meta
> >> >> dir is
> >> >>
> >>
> >>>>/yarn/nm/usercache/biops/appcache/application_1433833901375_0139/contai
> >>>>ne
> >> >>r_1433833901375_0139_01_000002/meta
> >> >> 2015-06-13 10:27:09,614 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5600000
> >> >>records!
> >> >> 2015-06-13 10:27:30,135 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5700000
> >> >>records!
> >> >> 2015-06-13 10:27:50,657 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5800000
> >> >>records!
> >> >> 2015-06-13 10:27:59,691 INFO [SpillThread]
> >> >> org.apache.kylin.job.hadoop.cube.CuboidReducer: Handled 100000
> >>records!
> >> >> 2015-06-13 10:28:11,185 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5900000
> >> >>records!
> >> >> 2015-06-13 10:28:31,751 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6000000
> >> >>records!
> >> >> 2015-06-13 10:28:52,303 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6100000
> >> >>records!
> >> >> 2015-06-13 10:29:03,235 INFO [SpillThread]
> >> >> org.apache.kylin.job.hadoop.cube.CuboidReducer: Handled 200000
> >>records!
> >> >> 2015-06-13 10:29:12,873 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6200000
> >> >>records!
> >> >> 2015-06-13 10:29:33,424 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6300000
> >> >>records!
> >> >> 2015-06-13 10:29:54,168 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6400000
> >> >>records!
> >> >> 2015-06-13 10:30:14,771 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6500000
> >> >>records!
> >> >> 2015-06-13 10:30:18,479 INFO [SpillThread]
> >> >> org.apache.kylin.job.hadoop.cube.CuboidReducer: Handled 300000
> >>records!
> >> >> 2015-06-13 10:30:35,325 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6600000
> >> >>records!
> >> >> 2015-06-13 10:30:39,831 INFO [SpillThread]
> >> >> org.apache.hadoop.mapred.MapTask: Finished spill 0
> >> >> 2015-06-13 10:30:39,831 INFO [main] org.apache.hadoop.mapred.MapTask:
> >> >> (RESET) equator 148671936 kv 37167980(148671920) kvi
> >>32705668(130822672)
> >> >> 2015-06-13 10:30:55,807 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6700000
> >> >>records!
> >> >> 2015-06-13 10:31:16,301 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6800000
> >> >>records!
> >> >> 2015-06-13 10:31:36,804 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6900000
> >> >>records!
> >> >> 2015-06-13 10:31:57,302 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7000000
> >> >>records!
> >> >> 2015-06-13 10:32:17,790 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7100000
> >> >>records!
> >> >> 2015-06-13 10:32:38,294 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7200000
> >> >>records!
> >> >> 2015-06-13 10:32:58,830 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7300000
> >> >>records!
> >> >> 2015-06-13 10:33:19,354 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7400000
> >> >>records!
> >> >> 2015-06-13 10:33:39,866 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7500000
> >> >>records!
> >> >> 2015-06-13 10:34:00,371 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7600000
> >> >>records!
> >> >> 2015-06-13 10:34:20,872 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7700000
> >> >>records!
> >> >> 2015-06-13 10:34:41,387 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7800000
> >> >>records!
> >> >> 2015-06-13 10:35:01,884 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7900000
> >> >>records!
> >> >> 2015-06-13 10:35:22,368 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8000000
> >> >>records!
> >> >> 2015-06-13 10:35:42,868 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8100000
> >> >>records!
> >> >> 2015-06-13 10:36:03,361 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8200000
> >> >>records!
> >> >> 2015-06-13 10:36:23,981 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8300000
> >> >>records!
> >> >> 2015-06-13 10:36:44,529 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8400000
> >> >>records!
> >> >> 2015-06-13 10:37:05,034 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8500000
> >> >>records!
> >> >> 2015-06-13 10:37:25,609 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8600000
> >> >>records!
> >> >> 2015-06-13 10:37:46,152 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8700000
> >> >>records!
> >> >> 2015-06-13 10:38:06,725 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8800000
> >> >>records!
> >> >> 2015-06-13 10:38:27,296 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8900000
> >> >>records!
> >> >> 2015-06-13 10:38:47,905 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9000000
> >> >>records!
> >> >> 2015-06-13 10:39:08,430 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9100000
> >> >>records!
> >> >> 2015-06-13 10:39:28,952 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9200000
> >> >>records!
> >> >> 2015-06-13 10:39:49,460 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9300000
> >> >>records!
> >> >> 2015-06-13 10:40:09,987 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9400000
> >> >>records!
> >> >> 2015-06-13 10:40:30,551 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9500000
> >> >>records!
> >> >> 2015-06-13 10:40:51,080 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9600000
> >> >>records!
> >> >> 2015-06-13 10:41:11,579 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9700000
> >> >>records!
> >> >> 2015-06-13 10:41:32,122 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9800000
> >> >>records!
> >> >> 2015-06-13 10:41:52,623 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9900000
> >> >>records!
> >> >> 2015-06-13 10:42:13,128 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10000000
> >> >>records!
> >> >> 2015-06-13 10:42:33,628 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10100000
> >> >>records!
> >> >> 2015-06-13 10:42:54,128 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10200000
> >> >>records!
> >> >> 2015-06-13 10:43:14,629 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10300000
> >> >>records!
> >> >> 2015-06-13 10:43:35,149 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10400000
> >> >>records!
> >> >> 2015-06-13 10:43:55,764 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10500000
> >> >>records!
> >> >> 2015-06-13 10:44:16,303 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10600000
> >> >>records!
> >> >> 2015-06-13 10:44:36,857 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10700000
> >> >>records!
> >> >> 2015-06-13 10:44:57,349 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10800000
> >> >>records!
> >> >> 2015-06-13 10:45:17,840 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10900000
> >> >>records!
> >> >> 2015-06-13 10:45:38,441 INFO [main]
> >> >> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 11000000
> >> >>records!
> >> >> 2015-06-13 10:45:41,057 INFO [main] org.apache.hadoop.mapred.MapTask:
> >> >> Spilling map output
> >> >> 2015-06-13 10:45:41,057 INFO [main] org.apache.hadoop.mapred.MapTask:
> >> >> bufstart = 148671936; bufend = 6882957; bufvoid = 268435443
> >> >> 2015-06-13 10:45:41,057 INFO [main] org.apache.hadoop.mapred.MapTask:
> >> >> kvstart = 37167980(148671920); kvend = 15142512(60570048); length =
> >> >> 22025469/16777216
> >> >> 2015-06-13 10:45:41,057 INFO [main] org.apache.hadoop.mapred.MapTask:
> >> >> (EQUATOR) 28908413 kvi 7227096(28908384)
> >> >>
> >> >> Thanks,
> >> >>
> >> >> On Mon, Jun 15, 2015 at 3:11 PM, 周千昊 <[email protected]> wrote:
> >> >>
> >> >>> Hi Mishra
> >> >>>      It is as what you described, since the data size is 30MB only,
> >> >>>hadoop
> >> >>> will only manage to run MR in single mapper and reduce.
> >> >>>      As for your question about why it takes so long to run on a
> >>small
> >> >>> dataset, can you please dig into the web page of *map reduce task
> >> >>>status
> >> >>> *to
> >> >>> check how much time the MR job really take, so that we can make
> >>sure if
> >> >>> the
> >> >>> time is consuming on the MR job or kylin job scheduling module
> >> >>>
> >> >>> Vineet Mishra <[email protected]>于2015年6月15日周一 下午4:51写道:
> >> >>>
> >> >>> > Shi,
> >> >>> >
> >> >>> > Hadoop is setup correctly on my cluster with the default block
> >>size
> >> >>>of
> >> >>> > 128Mb and its indeed very much running multiple Mapper/Reducer
> >>based
> >> >>> jobs
> >> >>> > for other cases.
> >> >>> >
> >> >>> > Its the only Kylin Cube building which is running through single
> >>M/R
> >> >>> job.
> >> >>> >
> >> >>> > Moreover to my surprise, the 4th running job which is Build Base
> >> >>>Cuboid
> >> >>> > Data shows Data size as 30Mb, is it the reason due to which single
> >> >>> mapper
> >> >>> > is getting invoked, if that being the case then even to process
> >>such
> >> >>>a
> >> >>> > small data set why is it taking around 50min.
> >> >>> >
> >> >>> > Thanks,
> >> >>> >
> >> >>> > On Mon, Jun 15, 2015 at 11:17 AM, Shi, Shaofeng <[email protected]
> >
> >> >>> wrote:
> >> >>> >
> >> >>> > > Yes you can adjust these parameters, for example give a smaller
> >> >>>value
> >> >>> for
> >> >>> > > kylin.job.mapreduce.default.reduce.input.mb; but it only affects
> >> >>>the
> >> >>> > > reducer number;
> >> >>> > >
> >> >>> > > I suggest you investigate why there is only 1 mapper be started;
> >> >>>Some
> >> >>> > > factors like hadoop cluster size, HDFS file block size will
> >>impact
> >> >>> this;
> >> >>> > > You can run a SQL (a query which need run MR, not simple select
> >>*)
> >> >>> with
> >> >>> > > hive -e, and then use the MR job track URL to see how many
> >>mappers
> >> >>>be
> >> >>> > > triggered; If it is still single, then the problem is in your
> >> >>>hadoop
> >> >>> > > configuration; Otherwise it may exists in Kylin, check if you
> >>put
> >> >>>some
> >> >>> > > additional parameter in conf/kylin_job_conf.xml.
> >> >>> > >
> >> >>> > >
> >> >>> > > On 6/15/15, 2:52 AM, "Vineet Mishra" <[email protected]>
> >> >>>wrote:
> >> >>> > >
> >> >>> > > >Can I have specification for these properties?
> >> >>> > > >
> >> >>> > > >KYLIN_JOB_MAPREDUCE_DEFAULT_REDUCE_COUNT_RATIO =
> >> >>> > > >"kylin.job.mapreduce.default.reduce.count.ratio";
> >> >>> > > >KYLIN_JOB_MAPREDUCE_DEFAULT_REDUCE_INPUT_MB =
> >> >>> > > >"kylin.job.mapreduce.default.reduce.input.mb";
> >> >>> > > >KYLIN_JOB_MAPREDUCE_MAX_REDUCER_NUMBER =
> >> >>> > > >"kylin.job.mapreduce.max.reducer.number";
> >> >>> > > >
> >> >>> > > >Thanks!
> >> >>> > > >
> >> >>> > > >On Sun, Jun 14, 2015 at 11:59 PM, Vineet Mishra <
> >> >>> [email protected]
> >> >>> > >
> >> >>> > > >wrote:
> >> >>> > > >
> >> >>> > > >> Hi Shi,
> >> >>> > > >>
> >> >>> > > >> Its alright!
> >> >>> > > >> So I was wondering my source hive Table is around 3 Gb,
> >>despite
> >> >>>of
> >> >>> my
> >> >>> > > >>hive
> >> >>> > > >> table being partitioned and holding the data around 50-70 Mb
> >>per
> >> >>> > > >>partition
> >> >>> > > >> the Mapper and Reducer getting spawned are single. The
> >>amount of
> >> >>> data
> >> >>> > > >>that
> >> >>> > > >> is being processed in the M/R is nothing as expected but it
> >> >>>takes
> >> >>> hell
> >> >>> > > >>lot
> >> >>> > > >> of time.
> >> >>> > > >>
> >> >>> > > >> As mentioned in the trailing mail that the job is getting
> >>very
> >> >>> slow,
> >> >>> > the
> >> >>> > > >> process Build Base Cuboid Data itself takes around 50mins to
> >>get
> >> >>> > > >> completed.
> >> >>> > > >>
> >> >>> > > >> I can tweak the reducer parameter mentioned by you but do u
> >> >>>think
> >> >>> that
> >> >>> > > >> will make a difference since the mapper is where the most of
> >>the
> >> >>> time
> >> >>> > is
> >> >>> > > >> spent.
> >> >>> > > >>
> >> >>> > > >> Can you share your thoughts for performance tuning for the
> >>cube
> >> >>> build!
> >> >>> > > >>
> >> >>> > > >> Thanks!
> >> >>> > > >>
> >> >>> > > >> On Sun, Jun 14, 2015 at 7:26 PM, Shi, Shaofeng
> >> >>><[email protected]>
> >> >>> > > wrote:
> >> >>> > > >>
> >> >>> > > >>> Hi, sorry, a busy weekend;
> >> >>> > > >>>
> >> >>> > > >>> Usually Kylin will request proper number of mapper and
> >> >>>reducers;
> >> >>> If
> >> >>> > you
> >> >>> > > >>> see single mapper/recudder, how much of your input and
> >>output?
> >> >>>If
> >> >>> > your
> >> >>> > > >>> cube is quite small, single mapper/reducer is possible;
> >> >>> > > >>>
> >> >>> > > >>> Number of mappers is decided by the FileInputFormat; But
> >> >>>number of
> >> >>> > > >>>reducer
> >> >>> > > >>> was set by Kylin, see:
> >> >>> > > >>>
> >> >>> > > >>>
> >> >>> > > >>>
> >> >>> > >
> >> >>>
> >> >>>
> >>
> https://github.com/apache/incubator-kylin/blob/master/job/src/main/java/
> >> >>> > > >>>org
> >> >>> > > >>> /apache/kylin/job/hadoop/cube/CuboidJob.java#L141
> >> >>> > > >>>
> >> >>> > > >>><
> >> >>> > >
> >> >>>
> >> https://github.com/apache/incubator-kylin/blob/master/job/src/main/java
> >> >>> > > >>>/org/apache/kylin/job/hadoop/cube/CuboidJob.java#L141>
> >> >>> > > >>>
> >> >>> > > >>>
> >> >>> > > >>>
> >> >>> > > >>>
> >> >>> > > >>> On 6/14/15, 5:25 PM, "Vineet Mishra"
> >><[email protected]>
> >> >>> wrote:
> >> >>> > > >>>
> >> >>> > > >>> >Urgent call, any follow up on this?
> >> >>> > > >>> >
> >> >>> > > >>> >On Fri, Jun 12, 2015 at 6:46 PM, Vineet Mishra
> >> >>> > > >>><[email protected]>
> >> >>> > > >>> >wrote:
> >> >>> > > >>> >
> >> >>> > > >>> >>
> >> >>> > > >>> >> Why org.apache.kylin.job.hadoop.cube.CuboidReducer is
> >> >>>running
> >> >>> > Single
> >> >>> > > >>> >> Mapper/Reducer for the job. Can I have the understanding
> >> >>>behind
> >> >>> > the
> >> >>> > > >>> >>reason
> >> >>> > > >>> >> of running it as single mapper/reducer.
> >> >>> > > >>> >>
> >> >>> > > >>> >> Thanks!
> >> >>> > > >>> >>
> >> >>> > > >>> >> On Fri, Jun 12, 2015 at 6:30 PM, Vineet Mishra
> >> >>> > > >>><[email protected]
> >> >>> > > >>> >
> >> >>> > > >>> >> wrote:
> >> >>> > > >>> >>
> >> >>> > > >>> >>> Hi All,
> >> >>> > > >>> >>>
> >> >>> > > >>> >>> I am building a cube using Kylin and I could see that
> >>the
> >> >>>job
> >> >>> is
> >> >>> > > >>> >>>running
> >> >>> > > >>> >>> with Single Mapper and Reducer for some of the
> >>intermediate
> >> >>> > process
> >> >>> > > >>> >>>such as
> >> >>> > > >>> >>>
> >> >>> > > >>> >>> Extract Fact Table Distinct Columns
> >> >>> > > >>> >>> Build Dimension Dictionary
> >> >>> > > >>> >>> Build N-Dimension Cuboid
> >> >>> > > >>> >>>
> >> >>> > > >>> >>> I am not sure what's the reason behind running the job
> >>with
> >> >>> > single
> >> >>> > > >>> M/R,
> >> >>> > > >>> >>> is it really necessary or is it some default config.
> >>which
> >> >>> can be
> >> >>> > > >>> >>>tweaked,
> >> >>> > > >>> >>> its 70 Mins and the job status is 25% !
> >> >>> > > >>> >>>
> >> >>> > > >>> >>> Urgent Call!
> >> >>> > > >>> >>>
> >> >>> > > >>> >>> Thanks!
> >> >>> > > >>> >>>
> >> >>> > > >>> >>
> >> >>> > > >>> >>
> >> >>> > > >>>
> >> >>> > > >>>
> >> >>> > > >>
> >> >>> > >
> >> >>> > >
> >> >>> >
> >> >>>
> >> >>
> >> >>
> >>
> >>
>
>

Reply via email to