Hi all,

I am working on a matrix multiplication operation for Mahout Flink Bindings 
that uses quite a few chained Flink Dataset operations,


When testing, I am getting the following error:


{...}

04/09/2016 22:30:35    CHAIN Reduce (Reduce at 
org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
 -> FlatMap (FlatMap at 
org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
 switched to CANCELED
04/09/2016 22:30:35    CHAIN Partition -> Map (Map at 
org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
 -> GroupCombine (GroupCombine at 
org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
 -> Combine (Reduce at 
org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
 switched to FAILED
java.lang.StackOverflowError
    at 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
    at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
    at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
    at 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
    at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
    at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
    at 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
    at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
    at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
    at 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
    at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
{...}


I've seen similar issues on the dev@flink list (and other places), but I 
believe that they were from recursive calls and objects which pointed back to 
themselves somehow.


This is a relatively straightforward method, it just has several Flink 
operations before execution is triggered.   If I remove some operations, eg. a 
reduce, i can get the method to complete on a simple test however the it will 
then, of course be numerically incorrect.


I am wondering if there is any workaround for this type of problem?


Thank You,


Andy

Reply via email to