I have sketched out a fix to the flakiness here, although its a bit of a hack 
at the moment. 
The reason for the flakiness has to do with the parallel processing and the 
random number generators used for the augmenters.

One random number generator is created per preprocessing thread 
(https://github.com/apache/incubator-mxnet/blob/master/src/io/iter_image_recordio_2.cc#L162).
 While the image records are retrieved in the same order 
(https://github.com/apache/incubator-mxnet/blob/master/src/io/iter_image_recordio_2.cc#L508)
 and always stored in the same index in the output array 
(https://github.com/apache/incubator-mxnet/blob/master/src/io/iter_image_recordio_2.cc#L567)
 - they are not guaranteed to be processed by the same thread every time. This 
matter because the default augmenter seeds the thread's random number generator 
(https://github.com/apache/incubator-mxnet/blob/master/src/io/image_aug_default.cc#L255)
 the first time it gets called 
(https://github.com/apache/incubator-mxnet/blob/master/src/io/iter_image_recordio_2.cc#L563).
 

This means that for different runs, the same record can be processed in a 
different iteration of the usage of the random number generator by the 
augmenters, thus leading to different random numbers being generated, which 
lead to different changes to the image -> flakyness.

To remedy the issue, I've changed to code such that the augmenter seed is a 
parameter to the image parser. I then use this seed and the image record index 
to seed the random number generator (outside of the default augmenter) before 
any changes are made to the image. Since the records are always retrieved in 
the same order, the same record will always have the same generator seed, 
independent of the number of threads used -> same random numbers being 
generated -> reproducible. 

It's a bit of a hack, so I would appreciate some input from the code owner 
(@anirudh2290 )  and the community.


[ Full content available at: 
https://github.com/apache/incubator-mxnet/pull/12485 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to