[GitHub] [incubator-mxnet] sxjscience commented on issue #16955: [Dataset] add flatten API to dataset

GitBox Mon, 02 Dec 2019 15:11:24 -0800

sxjscience commented on issue #16955: [Dataset] add flatten API to dataset
URL: https://github.com/apache/incubator-mxnet/pull/16955#issuecomment-560853809
 
 
   Explicitly constructing the dataset may be a better choice than adding a 
`flatten` method.
   ```python
   new_dataset = preprocess_function(dataset)
   ```
   
   From my perspective, the major design choice of `gluon.dataset` is to 
support `__getitem__` + lazy evaluation in `transform()`. With the help of lazy 
evaluation, we can generate the data on-the-fly and the overall data processing 
pipeline uses less memory. However, the `flatten` method  is equivalent to this 
python one-liner `SimpleDataset(list(itertools.chain.from_iterable(self)))` and 
there is no speed/memory benefit.
   
   Moreover, think about the case where each sample is a (data, label) pair. 
Calling flatten() will make the dataset look like `[data0, label0, data1, 
label1, ...]`, which is not very meaningful.
   
   I suggest we should just use 
`SimpleDataset(list(itertools.chain.from_iterable(self)))` to implement this 
functionality.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] sxjscience commented on issue #16955: [Dataset] add flatten API to dataset

Reply via email to