perdasilva commented on a change in pull request #12485: [WIP] 
test_ImageRecordIter_seed_augmentation flaky test fix
URL: https://github.com/apache/incubator-mxnet/pull/12485#discussion_r233790398
 
 

 ##########
 File path: src/io/image_iter_common.h
 ##########
 @@ -165,6 +167,8 @@ struct ImageRecParserParam : public 
dmlc::Parameter<ImageRecParserParam> {
         .describe("The data shuffle buffer size in MB. Only valid if shuffle 
is true.");
     DMLC_DECLARE_FIELD(shuffle_chunk_seed).set_default(0)
         .describe("The random seed for shuffling");
+    DMLC_DECLARE_FIELD(seed_aug).set_default(dmlc::optional<int>())
 
 Review comment:
   Fair call. However, the problem, I think, is architectural. 
   The seed_aug parameter is passed in to the default image augmenter 
(https://github.com/apache/incubator-mxnet/blob/master/src/io/image_aug_default.cc#L101).
 But, the actual random number generator is external, but seeded internally 
(https://github.com/apache/incubator-mxnet/blob/master/src/io/image_aug_default.cc#L254).
   
   As I've explained above, I can't see how to meet the requirements of 
processing the images in parallel and ensuring that the output is reproducible 
(by setting the augmentation seed). 
   
   My fix for the flakiness (without violating the aforementioned 
requirements), is to ensure that the RNG used to process the image is seeded 
before processing the image within that thread it is processed in.
   
   To be clear, none of the changes made were to facilitate running the tests, 
but to actually correct the underlying problem. I agree that the PR is a little 
rough around the edges atm, but what I'm trying to do is reach out to the 
community to understand what would be the best way to  actually fix the 
underlying problem.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to