sxjscience commented on issue #16955: [Dataset] add flatten API to dataset URL: https://github.com/apache/incubator-mxnet/pull/16955#issuecomment-560853809 Explicitly constructing the dataset may be a better choice than adding a `flatten` method. ```python new_dataset = preprocess_function(dataset) ``` From my perspective, the major design choice of `gluon.dataset` is to support `__getitem__` + lazy evaluation in `transform()`. With the help of lazy evaluation, we can generate the data on-the-fly and the overall data processing pipeline uses less memory. However, the `flatten` method is equivalent to this python one-liner `SimpleDataset(list(itertools.chain.from_iterable(self)))` and there is no speed/memory benefit. Moreover, think about the case where each sample is a (data, label) pair. Calling flatten() will make the dataset look like `[data0, label0, data1, label1, ...]`, which is not very meaningful. I suggest we should just use `SimpleDataset(list(itertools.chain.from_iterable(self)))` to implement this functionality.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
