YutingZhang commented on issue #13318: Improving multi-processing reliability 
for gluon DataLoader
URL: https://github.com/apache/incubator-mxnet/pull/13318#issuecomment-440552558
 
 
   @zhreshold It was actually also a bit confusing to me, but that was what 
happened. 
   
   One guess:
   
   Is there any size limit or get-put sync of the `data_queue`? Is it possible 
that the worker got stuck at the `data_queue.put` ? The `fetch_loop` thread can 
get joined before the workers (this is possible in original code, and more 
likely in my PR), and then the `data_queue` is full can the `put` get stuck. Is 
there any such possibility? 
   
   By the way, I tried to join the workers before sending `(None, None)` to the 
`fetch_loop` thread, but this can cause the `fetch_loop` to get stuck at 
`data_queue.get` (the requested data is not in the queue, and no worker will 
put the data). This can stuck the main thread, if we try to join the 
`fetch_loop` thread or leave a dangling thread otherwise.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to