I have been working with a problem for several days but no success. Building base Cuboid hangs in flushing the map output (ca 200MB) from the memory buffer to the disk:

INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 190884986; bufvoid = 536870912 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 134217724(536870896); kvend = 116938212(467752848); length = 17279513/33554432

I saw: Spilling map output was started and one spill.out was created in the disk (only 80K), but this process was stuck. Based on the MapTask source code, I shall see a Log info "Finished spill", which I did not see. Has anybody experience with this issue? Maybe the following properties are wrong?

<property>
    <name>mapreduce.task.io.sort.mb</name>
    <value>512</value>
  </property>
  <property>
    <name>mapreduce.task.io.sort.factor</name>
    <value>100</value>
  </property>

(Kylin also sets mapreduce.task.io.sort.mb but was not taken. Someone wrote that mapreduce.task.io.sort.factor shall also be set correspondingly)

Another question: why Kylin only uses two map tasks in the base Cuboid building? Can we specify more map tasks?

Cheers,

Jie

Reply via email to