kjchalup opened a new issue #11883: ImageIter last batch / batch padding 
behavior
URL: https://github.com/apache/incubator-mxnet/issues/11883
 
 
   ## Description
   mxnet.image.ImageIter lacks a way to choose behavior on last batch (when 
batch size doesn't divide the size of the dataset). In contrast, 
[mxnet.io.NDArrayIter](https://mxnet.incubator.apache.org/api/python/io/io.html#mxnet.io.NDArrayIter)
 has a `last_batch_handle` keyword arg which allows the user to choose what to 
do.
   
   This can lead to unexpected behavior:
   
   ```
   # Create a diter for an image dataset containing a total of 8 images.
   diter = mxnet.image.ImageIter(batch_size=5, ...)
   batch1 = diter.next()
   batch2 = diter.next()
   print(b2.label)
   ```
   
   output:
   ```
   [
    [3.0000000e+00 1.0000000e+00 4.0000000e+00 2.5249697e-29 2.8025969e-45]
    <NDArray 5 @cpu(0)>]
   ```
   
   In this case, the last batch has only 3 legit labels, and the last two 
labels are garbage. I think the user is expected to check the padding manually:
   
   ```
   print('batch1.pad = {}, batch2.pad = {}'.format(batch1.pad, batch2.pad))
   ```
   
   output:
   ```
   batch1.pad = 5, batch2.pad=2
   ```
   
   But this is not documented in the [mxnet.image API 
reference](https://mxnet.incubator.apache.org/api/python/image/image.html). In 
addition, b2.data is *not* filled with garbage -- it contains 5 legit images, 
adding to the confusion.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to