I have been working with a problem for several days but no success.
Building base Cuboid hangs in flushing the map output (ca 200MB) from
the memory buffer to the disk:
INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend =
190884986; bufvoid = 536870912
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart =
134217724(536870896); kvend = 116938212(467752848); length =
17279513/33554432
I saw: Spilling map output was started and one spill.out was created in
the disk (only 80K), but this process was stuck. Based on the MapTask
source code, I shall see a Log info "Finished spill", which I did not
see. Has anybody experience with this issue? Maybe the following
properties are wrong?
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>100</value>
</property>
(Kylin also sets mapreduce.task.io.sort.mb but was not taken. Someone
wrote that mapreduce.task.io.sort.factor shall also be set correspondingly)
Another question: why Kylin only uses two map tasks in the base Cuboid
building? Can we specify more map tasks?
Cheers,
Jie