perdasilva commented on a change in pull request #12485: [WIP]
test_ImageRecordIter_seed_augmentation flaky test fix
URL: https://github.com/apache/incubator-mxnet/pull/12485#discussion_r230650963
##########
File path: src/io/iter_image_recordio_2.cc
##########
@@ -518,6 +518,17 @@ inline unsigned
ImageRecordIOParser2<DType>::ParseChunk(DType* data_dptr, real_t
cv::Mat res;
rec.Load(blob.dptr, blob.size);
cv::Mat buf(1, rec.content_size, CV_8U, rec.content);
+
+ if (idx % 1000 == 0) {
+ if (param_.seed_aug.has_value()) {
+ LOG(INFO) << "aug seed: " << param_.seed_aug.value();
+ }
+ LOG(INFO) << "tid: " << tid << " idx: " << idx << " index: " <<
rec.image_index();
+ }
+ if (param_.seed_aug.has_value()) {
+ prnds_[tid]->seed(idx + param_.seed_aug.value());
Review comment:
There are two requirements here: that setting the seed will yield
reproducible results and that parallelization should be used to augment the
images. We need to reset the seed for each image because there is no guarantee
that the same image will be processed by the same thread. Or that even that it
will be the ith image processed by that thread across every run (or even
different hardware - the code figures out the number of threads to use for
processing). Therefore, resetting the random number generator at the start of
processing an image it the only way (at least that I could think of) to
guarantee that, in the case of setting a fixed seed, the same random
distortions will be applied to the same image, in a multi-threaded environment,
independent of the hardware being used. I hope this is clear. It's not the
easiest topic to discuss in written form lol.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services