another guess: the two mappers need communication? I saw that both
mappers have a progress of 0.667 and then not further. When I build the
same cube with smaller data size I only see one map task and the cube
was built successfully.
Cheers,
Jie
Am 20.06.2016 um 14:57 schrieb Jie Tao:
I have been working with a problem for several days but no success.
Building base Cuboid hangs in flushing the map output (ca 200MB) from
the memory buffer to the disk:
INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map
output
INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend =
190884986; bufvoid = 536870912
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart =
134217724(536870896); kvend = 116938212(467752848); length =
17279513/33554432
I saw: Spilling map output was started and one spill.out was created
in the disk (only 80K), but this process was stuck. Based on the
MapTask source code, I shall see a Log info "Finished spill", which I
did not see. Has anybody experience with this issue? Maybe the
following properties are wrong?
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>100</value>
</property>
(Kylin also sets mapreduce.task.io.sort.mb but was not taken. Someone
wrote that mapreduce.task.io.sort.factor shall also be set
correspondingly)
Another question: why Kylin only uses two map tasks in the base Cuboid
building? Can we specify more map tasks?
Cheers,
Jie