perdasilva commented on a change in pull request #12485: [WIP]
test_ImageRecordIter_seed_augmentation flaky test fix
URL: https://github.com/apache/incubator-mxnet/pull/12485#discussion_r233790398
##########
File path: src/io/image_iter_common.h
##########
@@ -165,6 +167,8 @@ struct ImageRecParserParam : public
dmlc::Parameter<ImageRecParserParam> {
.describe("The data shuffle buffer size in MB. Only valid if shuffle
is true.");
DMLC_DECLARE_FIELD(shuffle_chunk_seed).set_default(0)
.describe("The random seed for shuffling");
+ DMLC_DECLARE_FIELD(seed_aug).set_default(dmlc::optional<int>())
Review comment:
Fair call. However, the problem, I think, is architectural.
The seed_aug parameter is passed in to the default image augmenter
(https://github.com/apache/incubator-mxnet/blob/master/src/io/image_aug_default.cc#L101).
But, the actual random number generator is external, but seeded internally
(https://github.com/apache/incubator-mxnet/blob/master/src/io/image_aug_default.cc#L254).
As I've explained above, I can't see how to meet the requirements of
processing the images in parallel and ensuring that the output is reproducible
(by setting the augmentation seed).
My fix for the flakiness (without violating the aforementioned
requirements), is to ensure that the RNG used to process the image is seeded
before processing the image within that thread it is processed in.
To be clear, none of the changes made were to facilitate running the tests,
but to actually correct the underlying problem. I agree that the PR is a little
rough around the edges atm, but what I'm trying to do is reach out to the
community to understand what would be the best way to actually fix the
underlying problem.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services