hadjiprocopis opened a new issue #17144: ImageRecordIter : sometimes crashes when number of threads > 1 URL: https://github.com/apache/incubator-mxnet/issues/17144 Hi, I am using the Perl API (AI::MXNet the latest from CPAN v1.4) of MXNet v1.5.1 I am in Linux, fedora 30, kernel 5.3.13-200 and my Perl version is v5.28.2 I have compiled MXNet from source archive : apache-mxnet-src-1.5.1-incubating.tar.gz ( i had too many problems with the git repo). The problem arises with the following piece of Perl code: ``` #!/usr/bin/perl use strict; use warnings; use AI::MXNet qw/mx/; my $batch_size = 4; # num channels, width, height my $img_shape = [3, 256, 256]; my $training_file = 'training.bin'; # set src/io/iter_image_recordio_2.cc threadget=1 my $train_dataiter = mx->io->ImageRecordIter({ 'path_imgrec' => $training_file, 'path_imglist' => 'training.lst', # num channels, width, height 'data_shape' => $img_shape, 'batch_size' => $batch_size, 'label_width' => 1, }); my $batch = $train_dataiter->next(); ``` The problem is that sometimes 50-50 the above code causes a segmentation fault at `res.release()` of function ``` inline size_t ImageRecordIOParser2<DType>::ParseChunk(DType* data_dptr, real_t*$ const size_t current_size, dmlc::InputSplit::Blob * chunk) ``` In file `src/io/iter_image_recordio_2.cc` The problem disappears when I set the omp number of threads to 1 by adding a `threadget=1;` in said function. (setting the env var `export OMP_NUM_THREADS=1` seems to have no effect in my case that's why I edited the c file). I assume the input training file is correct because the corresponding python code works fine: ``` import mxnet as mx import numpy as np import matplotlib.pyplot as plt import cv2 # this works dataiter = mx.io.ImageRecordIter( path_imgrec="training.bin", path_imglist="training.lst", data_shape=(3,256,256), batch_size=4, label_width=1 ) batch = dataiter.next() # first batch. images = batch.data[0] # This will contain 4 (=batch_size) images each of 3x227x227. for i in range(4): plt.subplot(1,4,i+1) plt.imshow(images[i].asnumpy().astype(np.uint8).transpose((1,2,0))) plt.show() ``` For debugging my case I have the following settings in config.mk ``` DEV = 1 DEBUG = 1 USE_CUDA = 0 ENABLE_CUDA_RTC = 1 USE_CUDNN = 0 USE_NVTX = 0 USE_NCCL = 0 USE_OPENCV = 1 USE_LIBJPEG_TURBO = 0 USE_OPENMP = 1 USE_MKLDNN = USE_JEMALLOC = 0 USE_GPERFTOOLS = 1 USE_CPP_PACKAGE = 1 ``` Any ideas as to why this crash happens? Will I be able to decode images using many threads ever? I also have the following 3 side questions: 1) what is the corresponding C++ code for the above Perl (and Python) script? 2) any ideas how to control number of threads via env-variables? It seems that `OMP_NUM_THREADS`, `MXNET_CPU_PRIORITY_NTHREADS`, `MXNET_CPU_WORKER_NTHREADS` have absolutely no effect in my case (e.g. `OMP_NUM_THREADS=1 dataloader.pl` or `export OMP_NUM_THREADS=1; dataloader.pl` do not affect the number of threads in aforementioned function (`ImageRecordIOParser2()` of file `src/io/iter_image_recordio_2.cc`) 3) I need C++ examples!
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
