andrei5055 commented on issue #17335: Excessive GPU memory usage with dynamic shape input using Gluon interface URL: https://github.com/apache/incubator-mxnet/issues/17335#issuecomment-613179251 @Jerryzcn, @szha. Thanks so much for your suggestions! I tried to use them, and this is what I saw in my experiments. 1. I did not notice any advantage of using `mx.gpu(0).empty_cache()`. On the contrary, sometimes I saw new crashes with the error message associated with this call. 2. Yes, I saw about 25-30% slowdown associated with using `mx.nd.waitall()`. Unfortunately, it does not solve the initial GPU memory problem. 3. What makes this script approx. 30% faster is the usage of `del data_loader` after internal loop in ``` for epoch in range(10): logger.info('Current batch size is: %d' % batchsize) os.system('free >> freeMemory.txt') data_loader = DataLoader(dataset, batch_size=batchsize, batchify_fn=Tuple([Pad(pad_val=0), Pad(pad_val=0)]), num_workers=8) mx.gpu(0).empty_cache() btic = time.time() etic = time.time() for i, (src_data, dst_data) in enumerate(data_loader): . . . # Begin to update the parameter trainer.step(batchsize) . . . del data_loader batchsize *=2 ``` Here is some evidence for this statement. Initial version launched on datasetSize = 1000: ``` [Epoch 0], Speed: 95.212 samples/sec [Epoch 1], Speed: 186.922 samples/sec [Epoch 2], Speed: 312.916 samples/sec [Epoch 3], Speed: 434.118 samples/sec [Epoch 4], Speed: 405.835 samples/sec [Epoch 5], Speed: 222.373 samples/sec [Epoch 6], Speed: 18.799 samples/sec [Epoch 7], Speed: 88.688 samples/sec [Epoch 8], Speed: 371.118 samples/sec [Epoch 9], Speed: 75.738 samples/sec 1.993E+02 GPU stress test elapsed time ``` With `del data_loader` on the same dataset: ``` [Epoch 0], Speed: 93.743 samples/sec [Epoch 1], Speed: 180.014 samples/sec [Epoch 2], Speed: 298.007 samples/sec [Epoch 3], Speed: 405.286 samples/sec [Epoch 4], Speed: 527.154 samples/sec [Epoch 5], Speed: 633.825 samples/sec [Epoch 6], Speed: 30.708 samples/sec [Epoch 7], Speed: 64.030 samples/sec [Epoch 8], Speed: 207.408 samples/sec [Epoch 9], Speed: 140.913 samples/sec 1.368E+02 GPU stress test elapsed time ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
