On Tue, 6 Mar 2018 12:52:14, Robert Kern wrote: > I would just recommend using one of the codebases to initialize the > network, save the network out to disk, and load up the initialized network > in each of the different codebases for training. That way you are sure that > they are both starting from the same exact network parameters. > > Even if you do rewrite a precisely equivalent np.random.randn() for > Scala/Java, you ought to write the code to serialize the initialized > network anyways so that you can test that the two initialization routines > are equivalent. But if you're going to do that, you might as well take my > recommended approach.
Thanks for the suggestion! I decided to use the approach you proposed. Still, I'm puzzled by an issue that seems to be related to random initilization. I've three different NN implementations, 2 in Scala and one in NumPy. When using the exact same initialization parameters I get the same cost after each training iteration from each implementation. So, based on this I'd infer that the implementations work equivalently. However, the results look very different when using random initialization. With respect to exact cost this is course expected, but what I find troublesome is that after N training iterations the cost starts approaching zero with the NumPy code (most of of the time), whereas with the Scala based implementations cost fails to converge (most of the time). With NumPy I'm simply using the following random initilization code: np.random.randn(n_h, n_x) * 0.01 I'm trying to emulate the same behaviour in my Scala code by sampling from a Gaussian distribution with mean = 0 and std dev = 1. Any ideas? Marko
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion