leezu commented on pull request #19748: URL: https://github.com/apache/incubator-mxnet/pull/19748#issuecomment-761577683
> The reason why using Dataloader with auto_reload is: > MXNet 2.0's DataLoader with the default nopython mode prefetch data by default. MXNet 2 uses version number 2 because it breaks APIs. MXNet uses https://semver.org/ and we must not introduce backward incompatible changes in the v1.x branch. (Changing defaults with major impact is backwards incompatible). It's fine to add new features in v1.x. > There is only one iter for a DataLoader in most of the cases.(Thus only one prefetched iter is generated.) > if we call iter explicitly, we should call it twice (one right after the define of the DataLoader, and another one after the previous iter is consumed). So what's the problem here? Currently I'm not convinced your code / documentation is correct. For example: ``` >>> train_iter = DataLoader(train_data.transform_first(transform_train), ... batch_size=1,num_workers=1) (pre)fetching data here >>> it = iter(train_iter) # nothing is generated since lazy-evaluation occurs >>> it2 = iter(train_iter) >>> it3 = iter(train_iter) >>> it4 = iter(train_iter) >>> _ = next(it2) # the first iter we are using is the prefetched iter. >>> _ = next(it) # since the prefetched iter is cconsumed, we have to fetch data for `it`. ``` However, looking at your implementation, actually 4 prefetched iters are created and the comments in the last two lines are wrong. Please correct me if you disagree. > (maybe we should not using with ag.record(): since "Explicit is better than implicit." (Zen of Python)) What's the relation to the current discussion? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
