Hi All, Well I got it through.
Actually it was basically the pre-process which was converting my 3Gb hive table to 60Mb sequence file which due to block size of 128Mb was running in a single Mapper job. I changed the split size to 20Mb and it was able to spawn 3 Mapper after that. Anyways, thanks all for your quick response. Thanks! On Mon, Jun 15, 2015 at 3:27 PM, Vineet Mishra <[email protected]> wrote: > Hi, > > Sorry I misunderstood this value, actually 30Mb was the output for that > job. But I guess I have the got reason, (not exactly confirmed) as why its > behaving like this, the input to this job is sequence file which is 60Mb in > size and compressed. > > No doubt when it will be passed to the mapper, it is very much certain to > have the single mapper invoked due to the pre-configured block size. Now I > need to check If I can override the splitSize property for the job. > > Moreover I can see that the job is running seamlessly slow, kindly find > the stack trace for the job mentioned below. > > 2015-06-13 10:08:00,940 INFO [main] org.apache.hadoop.mapred.MapTask: > Processing split: hdfs:// > dev-hadoop-namenode.com:8020/tmp/kylin-9b675c66-4ce9-4a33-a356-2ccf9dbaca6a/kylin_intermediate_xyz_20150401000000_20150611000000_9b675c66_4ce9_4a33_a356_2ccf9dbaca6a/000000_0:0+60440271 > 2015-06-13 10:08:01,014 INFO [main] org.apache.hadoop.mapred.MapTask: > (EQUATOR) 0 kvi 67108860(268435440) > 2015-06-13 10:08:01,015 INFO [main] org.apache.hadoop.mapred.MapTask: > mapreduce.task.io.sort.mb: 256 > 2015-06-13 10:08:01,015 INFO [main] org.apache.hadoop.mapred.MapTask: soft > limit at 214748368 > 2015-06-13 10:08:01,015 INFO [main] org.apache.hadoop.mapred.MapTask: > bufstart = 0; bufvoid = 268435456 > 2015-06-13 10:08:01,015 INFO [main] org.apache.hadoop.mapred.MapTask: > kvstart = 67108860; length = 16777216 > 2015-06-13 10:08:01,023 INFO [main] org.apache.hadoop.mapred.MapTask: Map > output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer > 2015-06-13 10:08:01,048 INFO [main] > org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & > initialized native-zlib library > 2015-06-13 10:08:01,048 INFO [main] > org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor > [.deflate] > 2015-06-13 10:08:01,052 INFO [main] > org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor > [.deflate] > 2015-06-13 10:08:01,052 INFO [main] > org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor > [.deflate] > 2015-06-13 10:08:01,053 INFO [main] > org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor > [.deflate] > 2015-06-13 10:08:01,058 INFO [main] > org.apache.kylin.job.hadoop.AbstractHadoopJob: The absolute path for meta > dir is > /yarn/nm/usercache/biops/appcache/application_1433833901375_0139/container_1433833901375_0139_01_000002/meta > 2015-06-13 10:08:01,059 INFO [main] org.apache.kylin.common.KylinConfig: > Use > KYLIN_CONF=/yarn/nm/usercache/biops/appcache/application_1433833901375_0139/container_1433833901375_0139_01_000002/meta > 2015-06-13 10:08:01,084 INFO [main] org.apache.kylin.cube.CubeManager: > Initializing CubeManager with config > /yarn/nm/usercache/biops/filecache/19452/meta > 2015-06-13 10:08:01,086 INFO [main] > org.apache.kylin.common.persistence.ResourceStore: Using metadata url > /yarn/nm/usercache/biops/filecache/19452/meta for resource store > 2015-06-13 10:08:01,327 INFO [main] org.apache.kylin.cube.CubeDescManager: > Initializing CubeDescManager with config > /yarn/nm/usercache/biops/filecache/19452/meta > 2015-06-13 10:08:01,327 INFO [main] org.apache.kylin.cube.CubeDescManager: > Reloading Cube Metadata from folder > /yarn/nm/usercache/biops/filecache/19452/meta/cube_desc > 2015-06-13 10:08:22,834 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 100000 records! > 2015-06-13 10:08:43,225 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 200000 records! > 2015-06-13 10:09:03,622 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 300000 records! > 2015-06-13 10:09:23,999 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 400000 records! > 2015-06-13 10:09:44,372 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 500000 records! > 2015-06-13 10:10:04,780 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 600000 records! > 2015-06-13 10:10:25,143 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 700000 records! > 2015-06-13 10:10:45,512 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 800000 records! > 2015-06-13 10:11:05,895 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 900000 records! > 2015-06-13 10:11:26,284 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1000000 records! > 2015-06-13 10:11:46,716 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1100000 records! > 2015-06-13 10:12:07,174 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1200000 records! > 2015-06-13 10:12:27,646 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1300000 records! > 2015-06-13 10:12:48,127 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1400000 records! > 2015-06-13 10:13:08,614 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1500000 records! > 2015-06-13 10:13:29,148 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1600000 records! > 2015-06-13 10:13:49,686 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1700000 records! > 2015-06-13 10:14:10,167 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1800000 records! > 2015-06-13 10:14:30,652 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 1900000 records! > 2015-06-13 10:14:51,199 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2000000 records! > 2015-06-13 10:15:11,719 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2100000 records! > 2015-06-13 10:15:32,221 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2200000 records! > 2015-06-13 10:15:52,717 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2300000 records! > 2015-06-13 10:16:13,252 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2400000 records! > 2015-06-13 10:16:33,769 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2500000 records! > 2015-06-13 10:16:54,263 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2600000 records! > 2015-06-13 10:17:14,741 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2700000 records! > 2015-06-13 10:17:35,213 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2800000 records! > 2015-06-13 10:17:55,699 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 2900000 records! > 2015-06-13 10:18:16,204 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3000000 records! > 2015-06-13 10:18:36,721 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3100000 records! > 2015-06-13 10:18:57,249 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3200000 records! > 2015-06-13 10:19:17,743 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3300000 records! > 2015-06-13 10:19:38,275 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3400000 records! > 2015-06-13 10:19:58,812 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3500000 records! > 2015-06-13 10:20:19,312 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3600000 records! > 2015-06-13 10:20:39,920 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3700000 records! > 2015-06-13 10:21:00,518 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3800000 records! > 2015-06-13 10:21:21,008 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 3900000 records! > 2015-06-13 10:21:41,525 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4000000 records! > 2015-06-13 10:22:02,023 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4100000 records! > 2015-06-13 10:22:22,534 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4200000 records! > 2015-06-13 10:22:43,063 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4300000 records! > 2015-06-13 10:23:03,558 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4400000 records! > 2015-06-13 10:23:24,043 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4500000 records! > 2015-06-13 10:23:44,537 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4600000 records! > 2015-06-13 10:24:05,015 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4700000 records! > 2015-06-13 10:24:25,490 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4800000 records! > 2015-06-13 10:24:45,995 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 4900000 records! > 2015-06-13 10:25:06,487 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5000000 records! > 2015-06-13 10:25:26,964 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5100000 records! > 2015-06-13 10:25:47,524 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5200000 records! > 2015-06-13 10:26:08,032 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5300000 records! > 2015-06-13 10:26:28,514 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5400000 records! > 2015-06-13 10:26:49,053 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5500000 records! > 2015-06-13 10:26:50,360 INFO [main] org.apache.hadoop.mapred.MapTask: > Spilling map output > 2015-06-13 10:26:50,360 INFO [main] org.apache.hadoop.mapred.MapTask: > bufstart = 0; bufend = 126646464; bufvoid = 268435456 > 2015-06-13 10:26:50,360 INFO [main] org.apache.hadoop.mapred.MapTask: > kvstart = 67108860(268435440); kvend = 45083392(180333568); length = > 22025469/16777216 > 2015-06-13 10:26:50,360 INFO [main] org.apache.hadoop.mapred.MapTask: > (EQUATOR) 148671936 kvi 37167980(148671920) > 2015-06-13 10:26:54,802 INFO [SpillThread] > org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy] > 2015-06-13 10:26:54,883 INFO [SpillThread] > org.apache.kylin.job.hadoop.AbstractHadoopJob: The absolute path for meta > dir is > /yarn/nm/usercache/biops/appcache/application_1433833901375_0139/container_1433833901375_0139_01_000002/meta > 2015-06-13 10:27:09,614 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5600000 records! > 2015-06-13 10:27:30,135 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5700000 records! > 2015-06-13 10:27:50,657 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5800000 records! > 2015-06-13 10:27:59,691 INFO [SpillThread] > org.apache.kylin.job.hadoop.cube.CuboidReducer: Handled 100000 records! > 2015-06-13 10:28:11,185 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 5900000 records! > 2015-06-13 10:28:31,751 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6000000 records! > 2015-06-13 10:28:52,303 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6100000 records! > 2015-06-13 10:29:03,235 INFO [SpillThread] > org.apache.kylin.job.hadoop.cube.CuboidReducer: Handled 200000 records! > 2015-06-13 10:29:12,873 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6200000 records! > 2015-06-13 10:29:33,424 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6300000 records! > 2015-06-13 10:29:54,168 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6400000 records! > 2015-06-13 10:30:14,771 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6500000 records! > 2015-06-13 10:30:18,479 INFO [SpillThread] > org.apache.kylin.job.hadoop.cube.CuboidReducer: Handled 300000 records! > 2015-06-13 10:30:35,325 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6600000 records! > 2015-06-13 10:30:39,831 INFO [SpillThread] > org.apache.hadoop.mapred.MapTask: Finished spill 0 > 2015-06-13 10:30:39,831 INFO [main] org.apache.hadoop.mapred.MapTask: > (RESET) equator 148671936 kv 37167980(148671920) kvi 32705668(130822672) > 2015-06-13 10:30:55,807 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6700000 records! > 2015-06-13 10:31:16,301 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6800000 records! > 2015-06-13 10:31:36,804 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 6900000 records! > 2015-06-13 10:31:57,302 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7000000 records! > 2015-06-13 10:32:17,790 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7100000 records! > 2015-06-13 10:32:38,294 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7200000 records! > 2015-06-13 10:32:58,830 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7300000 records! > 2015-06-13 10:33:19,354 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7400000 records! > 2015-06-13 10:33:39,866 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7500000 records! > 2015-06-13 10:34:00,371 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7600000 records! > 2015-06-13 10:34:20,872 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7700000 records! > 2015-06-13 10:34:41,387 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7800000 records! > 2015-06-13 10:35:01,884 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 7900000 records! > 2015-06-13 10:35:22,368 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8000000 records! > 2015-06-13 10:35:42,868 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8100000 records! > 2015-06-13 10:36:03,361 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8200000 records! > 2015-06-13 10:36:23,981 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8300000 records! > 2015-06-13 10:36:44,529 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8400000 records! > 2015-06-13 10:37:05,034 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8500000 records! > 2015-06-13 10:37:25,609 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8600000 records! > 2015-06-13 10:37:46,152 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8700000 records! > 2015-06-13 10:38:06,725 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8800000 records! > 2015-06-13 10:38:27,296 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 8900000 records! > 2015-06-13 10:38:47,905 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9000000 records! > 2015-06-13 10:39:08,430 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9100000 records! > 2015-06-13 10:39:28,952 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9200000 records! > 2015-06-13 10:39:49,460 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9300000 records! > 2015-06-13 10:40:09,987 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9400000 records! > 2015-06-13 10:40:30,551 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9500000 records! > 2015-06-13 10:40:51,080 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9600000 records! > 2015-06-13 10:41:11,579 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9700000 records! > 2015-06-13 10:41:32,122 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9800000 records! > 2015-06-13 10:41:52,623 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 9900000 records! > 2015-06-13 10:42:13,128 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10000000 records! > 2015-06-13 10:42:33,628 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10100000 records! > 2015-06-13 10:42:54,128 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10200000 records! > 2015-06-13 10:43:14,629 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10300000 records! > 2015-06-13 10:43:35,149 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10400000 records! > 2015-06-13 10:43:55,764 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10500000 records! > 2015-06-13 10:44:16,303 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10600000 records! > 2015-06-13 10:44:36,857 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10700000 records! > 2015-06-13 10:44:57,349 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10800000 records! > 2015-06-13 10:45:17,840 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 10900000 records! > 2015-06-13 10:45:38,441 INFO [main] > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper: Handled 11000000 records! > 2015-06-13 10:45:41,057 INFO [main] org.apache.hadoop.mapred.MapTask: > Spilling map output > 2015-06-13 10:45:41,057 INFO [main] org.apache.hadoop.mapred.MapTask: > bufstart = 148671936; bufend = 6882957; bufvoid = 268435443 > 2015-06-13 10:45:41,057 INFO [main] org.apache.hadoop.mapred.MapTask: > kvstart = 37167980(148671920); kvend = 15142512(60570048); length = > 22025469/16777216 > 2015-06-13 10:45:41,057 INFO [main] org.apache.hadoop.mapred.MapTask: > (EQUATOR) 28908413 kvi 7227096(28908384) > > Thanks, > > On Mon, Jun 15, 2015 at 3:11 PM, 周千昊 <[email protected]> wrote: > >> Hi Mishra >> It is as what you described, since the data size is 30MB only, hadoop >> will only manage to run MR in single mapper and reduce. >> As for your question about why it takes so long to run on a small >> dataset, can you please dig into the web page of *map reduce task status >> *to >> check how much time the MR job really take, so that we can make sure if >> the >> time is consuming on the MR job or kylin job scheduling module >> >> Vineet Mishra <[email protected]>于2015年6月15日周一 下午4:51写道: >> >> > Shi, >> > >> > Hadoop is setup correctly on my cluster with the default block size of >> > 128Mb and its indeed very much running multiple Mapper/Reducer based >> jobs >> > for other cases. >> > >> > Its the only Kylin Cube building which is running through single M/R >> job. >> > >> > Moreover to my surprise, the 4th running job which is Build Base Cuboid >> > Data shows Data size as 30Mb, is it the reason due to which single >> mapper >> > is getting invoked, if that being the case then even to process such a >> > small data set why is it taking around 50min. >> > >> > Thanks, >> > >> > On Mon, Jun 15, 2015 at 11:17 AM, Shi, Shaofeng <[email protected]> >> wrote: >> > >> > > Yes you can adjust these parameters, for example give a smaller value >> for >> > > kylin.job.mapreduce.default.reduce.input.mb; but it only affects the >> > > reducer number; >> > > >> > > I suggest you investigate why there is only 1 mapper be started; Some >> > > factors like hadoop cluster size, HDFS file block size will impact >> this; >> > > You can run a SQL (a query which need run MR, not simple select *) >> with >> > > hive -e, and then use the MR job track URL to see how many mappers be >> > > triggered; If it is still single, then the problem is in your hadoop >> > > configuration; Otherwise it may exists in Kylin, check if you put some >> > > additional parameter in conf/kylin_job_conf.xml. >> > > >> > > >> > > On 6/15/15, 2:52 AM, "Vineet Mishra" <[email protected]> wrote: >> > > >> > > >Can I have specification for these properties? >> > > > >> > > >KYLIN_JOB_MAPREDUCE_DEFAULT_REDUCE_COUNT_RATIO = >> > > >"kylin.job.mapreduce.default.reduce.count.ratio"; >> > > >KYLIN_JOB_MAPREDUCE_DEFAULT_REDUCE_INPUT_MB = >> > > >"kylin.job.mapreduce.default.reduce.input.mb"; >> > > >KYLIN_JOB_MAPREDUCE_MAX_REDUCER_NUMBER = >> > > >"kylin.job.mapreduce.max.reducer.number"; >> > > > >> > > >Thanks! >> > > > >> > > >On Sun, Jun 14, 2015 at 11:59 PM, Vineet Mishra < >> [email protected] >> > > >> > > >wrote: >> > > > >> > > >> Hi Shi, >> > > >> >> > > >> Its alright! >> > > >> So I was wondering my source hive Table is around 3 Gb, despite of >> my >> > > >>hive >> > > >> table being partitioned and holding the data around 50-70 Mb per >> > > >>partition >> > > >> the Mapper and Reducer getting spawned are single. The amount of >> data >> > > >>that >> > > >> is being processed in the M/R is nothing as expected but it takes >> hell >> > > >>lot >> > > >> of time. >> > > >> >> > > >> As mentioned in the trailing mail that the job is getting very >> slow, >> > the >> > > >> process Build Base Cuboid Data itself takes around 50mins to get >> > > >> completed. >> > > >> >> > > >> I can tweak the reducer parameter mentioned by you but do u think >> that >> > > >> will make a difference since the mapper is where the most of the >> time >> > is >> > > >> spent. >> > > >> >> > > >> Can you share your thoughts for performance tuning for the cube >> build! >> > > >> >> > > >> Thanks! >> > > >> >> > > >> On Sun, Jun 14, 2015 at 7:26 PM, Shi, Shaofeng <[email protected]> >> > > wrote: >> > > >> >> > > >>> Hi, sorry, a busy weekend; >> > > >>> >> > > >>> Usually Kylin will request proper number of mapper and reducers; >> If >> > you >> > > >>> see single mapper/recudder, how much of your input and output? If >> > your >> > > >>> cube is quite small, single mapper/reducer is possible; >> > > >>> >> > > >>> Number of mappers is decided by the FileInputFormat; But number of >> > > >>>reducer >> > > >>> was set by Kylin, see: >> > > >>> >> > > >>> >> > > >>> >> > > >> https://github.com/apache/incubator-kylin/blob/master/job/src/main/java/ >> > > >>>org >> > > >>> /apache/kylin/job/hadoop/cube/CuboidJob.java#L141 >> > > >>> >> > > >>>< >> > > >> https://github.com/apache/incubator-kylin/blob/master/job/src/main/java >> > > >>>/org/apache/kylin/job/hadoop/cube/CuboidJob.java#L141> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> On 6/14/15, 5:25 PM, "Vineet Mishra" <[email protected]> >> wrote: >> > > >>> >> > > >>> >Urgent call, any follow up on this? >> > > >>> > >> > > >>> >On Fri, Jun 12, 2015 at 6:46 PM, Vineet Mishra >> > > >>><[email protected]> >> > > >>> >wrote: >> > > >>> > >> > > >>> >> >> > > >>> >> Why org.apache.kylin.job.hadoop.cube.CuboidReducer is running >> > Single >> > > >>> >> Mapper/Reducer for the job. Can I have the understanding behind >> > the >> > > >>> >>reason >> > > >>> >> of running it as single mapper/reducer. >> > > >>> >> >> > > >>> >> Thanks! >> > > >>> >> >> > > >>> >> On Fri, Jun 12, 2015 at 6:30 PM, Vineet Mishra >> > > >>><[email protected] >> > > >>> > >> > > >>> >> wrote: >> > > >>> >> >> > > >>> >>> Hi All, >> > > >>> >>> >> > > >>> >>> I am building a cube using Kylin and I could see that the job >> is >> > > >>> >>>running >> > > >>> >>> with Single Mapper and Reducer for some of the intermediate >> > process >> > > >>> >>>such as >> > > >>> >>> >> > > >>> >>> Extract Fact Table Distinct Columns >> > > >>> >>> Build Dimension Dictionary >> > > >>> >>> Build N-Dimension Cuboid >> > > >>> >>> >> > > >>> >>> I am not sure what's the reason behind running the job with >> > single >> > > >>> M/R, >> > > >>> >>> is it really necessary or is it some default config. which >> can be >> > > >>> >>>tweaked, >> > > >>> >>> its 70 Mins and the job status is 25% ! >> > > >>> >>> >> > > >>> >>> Urgent Call! >> > > >>> >>> >> > > >>> >>> Thanks! >> > > >>> >>> >> > > >>> >> >> > > >>> >> >> > > >>> >> > > >>> >> > > >> >> > > >> > > >> > >> > >
