houweidong commented on issue #14742: CPU memory leak when running train_yolov3.py URL: https://github.com/apache/incubator-mxnet/issues/14742#issuecomment-486909840 i just add thread_pool in three place: ("n" presents the nth added param ) ``` def get_dataloader(net, train_dataset, val_dataset, data_shape, batch_size, num_workers, args): """Get dataloader.""" width, height = data_shape, data_shape batchify_fn = Tuple(*([Stack() for _ in range(6)] + [Pad(axis=0, pad_val=-1) for _ in range(1)])) # stack image, all targets generated if args.no_random_shape: train_loader = gluon.data.DataLoader( train_dataset.transform(YOLO3DefaultTrainTransform(width, height, net, mixup=args.mixup)), batch_size, True, batchify_fn=batchify_fn, last_batch='rollover', num_workers=num_workers, thread_pool=True "1") else: transform_fns = [YOLO3DefaultTrainTransform(x * 32, x * 32, net, mixup=args.mixup) for x in range(10, 20)] train_loader = RandomTransformDataLoader( transform_fns, train_dataset, batch_size=batch_size, interval=10, last_batch='rollover', shuffle=True, batchify_fn=batchify_fn, num_workers=num_workers, thread_pool=True "2") val_batchify_fn = Tuple(Stack(), Pad(pad_val=-1)) val_loader = gluon.data.DataLoader( val_dataset.transform(YOLO3DefaultValTransform(width, height)), batch_size, False, batchify_fn=val_batchify_fn, last_batch='keep', num_workers=num_workers, thread_pool=True "3") return train_loader, val_loader ``` and the problem exits still(25 epoch error, should be the full shm leads to)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
