perdasilva commented on a change in pull request #12485: [WIP]
test_ImageRecordIter_seed_augmentation flaky test fix
URL: https://github.com/apache/incubator-mxnet/pull/12485#discussion_r230650963
##########
File path: src/io/iter_image_recordio_2.cc
##########
@@ -518,6 +518,17 @@ inline unsigned
ImageRecordIOParser2<DType>::ParseChunk(DType* data_dptr, real_t
cv::Mat res;
rec.Load(blob.dptr, blob.size);
cv::Mat buf(1, rec.content_size, CV_8U, rec.content);
+
+ if (idx % 1000 == 0) {
+ if (param_.seed_aug.has_value()) {
+ LOG(INFO) << "aug seed: " << param_.seed_aug.value();
+ }
+ LOG(INFO) << "tid: " << tid << " idx: " << idx << " index: " <<
rec.image_index();
+ }
+ if (param_.seed_aug.has_value()) {
+ prnds_[tid]->seed(idx + param_.seed_aug.value());
Review comment:
There are two requirements here: that setting the seed will yield
reproducible results and that parallelization should be used to speed up image
augmentation. We need to reset the seed for each image because there is no
guarantee that the same image will be processed by the same thread, or that
even that it will be the ith image processed by that thread, across independent
runs or even different hardware (the code figures out the number of threads to
use for processing). Therefore, resetting the random number generator at the
start of processing an image it the only way (at least that I could think of)
to guarantee reproducibility when setting a fixed seed. That is, to guarantee
that the same random distortions will be applied to the same image across
independent runs and different hardware configurations. I hope this make it a
little clearer. It's not the easiest topic to discuss in written form lol.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services