benqua commented on issue #8129: [scala] Module api: label (1,1,868,868) and prediction (1,868,868)should have the same length URL: https://github.com/apache/incubator-mxnet/issues/8129#issuecomment-333992969 ok, I checked and log the shapes as suggested and realize that it is not the batch size dimension that is lost but the channel one (I had 1 for both, so I didn't realize at first). The network is a u-net very similar as the one describe in the original u-net paper. Each pixel can be in one of two classes, as in the original paper. So, the last layers are: ```scala // output val conv10 = Symbol.Convolution()()(Map("data" -> conv9, "num_filter" -> 2, "kernel" -> "(1,1)")) val label = Symbol.Variable("softmax_label") val so = Symbol.SoftmaxOutput()()(Map("data" -> conv10, "label" -> label, "multi_output" -> true)) ``` Now, when I run the code to train the network (posted above) with more logging, I get the following: ``` 2017-10-03 23:40:37,808 [run-main-0] [UNet] [INFO] - so - Shape: Vector((1,2,868,868)) 2017-10-03 23:40:37,897 [run-main-0] [TrainModuleUNet] [INFO] - symbol shape: Vector((1,2,868,868)) 2017-10-03 23:40:37,899 [run-main-0] [TrainModuleUNet] [INFO] - providedData: data -> (1,1,1052,1052) 2017-10-03 23:40:37,900 [run-main-0] [TrainModuleUNet] [INFO] - providedLabel: softmax_label -> (1,1,868,868) 2017-10-03 23:40:38,038 [run-main-0] [TrainModuleUNet] [INFO] - bound! 2017-10-03 23:40:38,088 [run-main-0] [TrainModuleUNet] [INFO] - initialized! 2017-10-03 23:40:38,089 [run-main-0] [ml.dmlc.mxnet.module.Module] [WARN] - Already binded, ignoring bind() MKL Build:20170720 [error] (run-main-0) java.lang.IllegalArgumentException: requirement failed: label (1,1,868,868) and prediction (1,868,868)should have the same length. java.lang.IllegalArgumentException: requirement failed: label (1,1,868,868) and prediction (1,868,868)should have the same length. at scala.Predef$.require(Predef.scala:224) at ml.dmlc.mxnet.Accuracy$$anonfun$update$4.apply(EvalMetric.scala:111) (...) ``` The output of my network has a shape of (1, 2, 868, 868). However, the error message said that prediction shape is (1, 868, 868). How can this be? I also see that my label is likely not in the right shape (one channel, with either 0 or 1 instead of two channels with the probability of 0 and 1). However, the bind function seems ok, which makes me think that there is possibly a implicit conversion done somewhere. Another very strange thing is that the program doesn't really stop after this exception. Memory and CPU usage continue to grow up until I kill sbt. Despite the filed require, the C++ backend continues to work... Any hint about how to correctly use SoftmaxOutput with muli_output would be greatly appreciate. :) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
With regards, Apache Git Services