This is an automated email from the ASF dual-hosted git repository.
skm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git
The following commit(s) were added to refs/heads/master by this push:
new d4adbd2 Refine the documentation of im2rec (#12606)
d4adbd2 is described below
commit d4adbd246998dd1c1ac0c425fe048461d7a22915
Author: Jake Lee <[email protected]>
AuthorDate: Wed Sep 19 21:30:21 2018 -0700
Refine the documentation of im2rec (#12606)
* update the example data format and link to each others
* fix wording
* delete the typo
---
docs/faq/recordio.md | 28 ++++++++++++++++------------
docs/tutorials/basic/data.md | 2 ++
2 files changed, 18 insertions(+), 12 deletions(-)
diff --git a/docs/faq/recordio.md b/docs/faq/recordio.md
index 10ab6c7..f615718 100644
--- a/docs/faq/recordio.md
+++ b/docs/faq/recordio.md
@@ -6,35 +6,39 @@ RecordIO implements a file format for a sequence of records.
We recommend storin
* Packing data together allows continuous reading on the disk.
* RecordIO has a simple way to partition, simplifying distributed setting. We
provide an example later.
-We provide the [im2rec
tool](https://github.com/dmlc/mxnet/blob/master/tools/im2rec.cc) so you can
create an Image RecordIO dataset by yourself. The following walkthrough shows
you how.
+We provide the [im2rec
tool](https://github.com/dmlc/mxnet/blob/master/tools/im2rec.cc) so you can
create an Image RecordIO dataset by yourself. The following walkthrough shows
you how. Note that there is python version of [im2rec
tool](https://github.com/apache/incubator-mxnet/blob/master/tools/im2rec.py)
and [example](https://mxnet.incubator.apache.org/tutorials/basic/data.html)
using real-world data.
### Prerequisites
+
Download the data. You don't need to resize the images manually. You can use
```im2rec``` to resize them automatically. For details, see the "Extension:
Using Multiple Labels for a Single Image," later in this topic.
### Step 1. Make an Image List File
+
+* Note that the im2rec.py provide a param `--list` to generate the list for
you but im2rec.cc doesn't support it.
+
After you download the data, you need to make an image list file. The format
is:
```
integer_image_index \t label_index \t path_to_image
```
Typically, the program takes the list of names of all of the images, shuffles
them, then separates them into two lists: a training filename list and a
testing filename list. Write the list in the right format.
-
This is an example file:
```bash
-95099 464 n04467665_17283.JPEG
-10025081 412 ILSVRC2010_val_00025082.JPEG
-74181 789 n01915811_2739.JPEG
-10035553 859 ILSVRC2010_val_00035554.JPEG
-10048727 929 ILSVRC2010_val_00048728.JPEG
-94028 924 n01980166_4956.JPEG
-1080682 650 n11807979_571.JPEG
-972457 633 n07723039_1627.JPEG
-7534 11 n01630670_4486.JPEG
-1191261 249 n12407079_5106.JPEG
+95099 464.000000 n04467665_17283.JPEG
+10025081 412.000000 ILSVRC2010_val_00025082.JPEG
+74181 789.000000 n01915811_2739.JPEG
+10035553 859.000000 ILSVRC2010_val_00035554.JPEG
+10048727 929.000000 ILSVRC2010_val_00048728.JPEG
+94028 924.000000 n01980166_4956.JPEG
+1080682 650.000000 n11807979_571.JPEG
+972457 633.000000 n07723039_1627.JPEG
+7534 11.000000 n01630670_4486.JPEG
+1191261 249.000000 n12407079_5106.JPEG
```
### Step 2. Create the Binary File
+
To generate a binary image, use `im2rec` in the tool folder. `im2rec` takes
the path of the `_image list file_` you generated, the `_root path_` of the
images, and the `_output file path_` as input. This process usually takes
several hours, so be patient.
Sample command:
diff --git a/docs/tutorials/basic/data.md b/docs/tutorials/basic/data.md
index 0a5dd59..b5d0884 100644
--- a/docs/tutorials/basic/data.md
+++ b/docs/tutorials/basic/data.md
@@ -315,6 +315,8 @@ print(mx.recordio.unpack_img(s))
You can also convert raw images into *RecordIO* format using the ``im2rec.py``
utility script that is provided in the MXNet
[src/tools](https://github.com/dmlc/mxnet/tree/master/tools) folder.
An example of how to use the script for converting to *RecordIO* format is
shown in the `Image IO` section below.
+* Note that there is a C++ version of
[im2rec](https://github.com/dmlc/mxnet/blob/master/tools/im2rec.cc), please
refer to [here](https://mxnet.incubator.apache.org/faq/recordio.html) for more
information.
+
## Image IO
In this section, we will learn how to preprocess and load image data in MXNet.