hadjiprocopis opened a new issue #17144: ImageRecordIter : sometimes crashes 
when number of threads > 1
URL: https://github.com/apache/incubator-mxnet/issues/17144
 
 
   Hi,
   I am using the Perl API (AI::MXNet the latest from CPAN v1.4) of MXNet 
v1.5.1 
   I am in Linux, fedora 30, kernel  5.3.13-200 and my Perl version is v5.28.2
   
   I have compiled MXNet from source archive : 
apache-mxnet-src-1.5.1-incubating.tar.gz ( i had too many problems with the git 
repo).
   
   The problem arises with the following piece of Perl code:
   
   ```
   #!/usr/bin/perl
   
   use strict;
   use warnings;
   
   use AI::MXNet qw/mx/;
   
   my $batch_size = 4;
   # num channels, width, height
   my $img_shape = [3, 256, 256];
   my $training_file = 'training.bin';
   # set src/io/iter_image_recordio_2.cc threadget=1
   my $train_dataiter = mx->io->ImageRecordIter({
        'path_imgrec' => $training_file,
        'path_imglist' => 'training.lst',
        # num channels, width, height
        'data_shape' => $img_shape,
        'batch_size' => $batch_size,
        'label_width' => 1,
   });
   my $batch = $train_dataiter->next();
   
   ```
   The problem is that sometimes 50-50 the above code causes a segmentation 
fault at `res.release()` of function 
   ```
   
   inline size_t ImageRecordIOParser2<DType>::ParseChunk(DType* data_dptr, 
real_t*$
     const size_t current_size, dmlc::InputSplit::Blob * chunk)
   ```
   In file `src/io/iter_image_recordio_2.cc`
   
   The problem disappears when I set the omp number of threads to 1 by adding a 
`threadget=1;` in said function. (setting the env var `export 
OMP_NUM_THREADS=1` seems to have no effect in my case that's why I edited the c 
file).
   
   I assume the input training file is correct because the corresponding python 
code works fine:
   
   ```
   import mxnet as mx
   import numpy as np
   import matplotlib.pyplot as plt
   import cv2
   
   # this works
   
   dataiter = mx.io.ImageRecordIter(
     path_imgrec="training.bin",
     path_imglist="training.lst",
     data_shape=(3,256,256),
     batch_size=4,
     label_width=1
   )
   
   batch = dataiter.next() # first batch.
   images = batch.data[0] # This will contain 4 (=batch_size) images each of 
3x227x227.
   
   for i in range(4):
       plt.subplot(1,4,i+1)
       plt.imshow(images[i].asnumpy().astype(np.uint8).transpose((1,2,0)))
   plt.show()
   
   ```
   
   For debugging my case I have the following settings in config.mk
   ```
   DEV = 1
   DEBUG = 1
   USE_CUDA = 0
   ENABLE_CUDA_RTC = 1
   USE_CUDNN = 0
   USE_NVTX = 0
   USE_NCCL = 0
   USE_OPENCV = 1
   USE_LIBJPEG_TURBO = 0
   USE_OPENMP = 1
   USE_MKLDNN =
   USE_JEMALLOC = 0
   USE_GPERFTOOLS = 1
   USE_CPP_PACKAGE = 1
   ```
   Any ideas as to why this crash happens? Will I be able to decode images 
using many threads ever?
   
   I also have the following 3 side questions:
   1) what is the corresponding C++ code for the above Perl (and Python) script?
   2) any ideas how to control number of threads via env-variables? It seems 
that `OMP_NUM_THREADS`, `MXNET_CPU_PRIORITY_NTHREADS`, 
`MXNET_CPU_WORKER_NTHREADS` have absolutely no effect in my case (e.g. 
`OMP_NUM_THREADS=1 dataloader.pl` or `export OMP_NUM_THREADS=1; dataloader.pl` 
do not affect the number of threads in aforementioned function 
(`ImageRecordIOParser2()` of file `src/io/iter_image_recordio_2.cc`)
   3) I need C++ examples!
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to