[
https://issues.apache.org/jira/browse/SINGA-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946427#comment-14946427
]
ASF subversion and git services commented on SINGA-82:
------------------------------------------------------
Commit d99b24cb75def9fdbdc59273c4297abb75813c36 in incubator-singa's branch
refs/heads/master from [~flytosky]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=d99b24c ]
SINGA-82 Refactor input layers using data store abstraction
Add Store abstraction for read (writing data). Implemented two backend,
1. KVFile, which was named DataShard. It is a binary file, each tuple
has a unique key.
2. TextFile, which is a plain text file with each line be the value
field of a tuple (the key is the line No.).
TODO, implment HDFS and image folder as the backend.
> Refactor input layers using data store abstraction
> --------------------------------------------------
>
> Key: SINGA-82
> URL: https://issues.apache.org/jira/browse/SINGA-82
> Project: Singa
> Issue Type: Improvement
> Reporter: wangwei
> Assignee: wangwei
>
> 1. Separate the data storage from Layer. Currently, SINGA creates one layer
> to read data from one storage, e.g., ShardData, CSV, LMDB. One problem is
> that only read operations are provided. When users prepare the training data,
> they have to get familiar with the read/write operations for each storage.
> Inspired from caffe::db::DB, we can provide a storage abstraction with
> simple read/write operation interfaces. Then users call these operations to
> prepare their training data. Particularly, training data is stored as (string
> key, string value) tuples. The base Store class
> {code}
> // open the store for reading, writing or appending
> virtual bool Open(const string& source, Mode mode);
> // for reading tuples
> virtual bool Read(string*key, string*value) = 0;
> // for writing tuples
> virtual bool Write(const string& key, const string& value) = 0;
> {code}
> The specific storage, e.g., CSV, LMDB, image folder or HDFS (will be
> supported soon), inherits Store and overrides the functions.
> Consequently, a single KVInputLayer (like the SequenceFile.Reader from
> Hadoop) can read from different sources by configuring *store* field (e.g.,
> store=csv).
> With the Store class, we can implement a KVInputLayer to read batchsize
> tuples in its ComputeFeature function. The tuple is parsed by a virtual
> function depending on the application (or the format of the tuple).
> {code}
> // parse the tuple as the k-th instance for one mini-batch
> virtual bool Parse(int k, const string& key, const string& tuple) = 0;
> {code}
> For example, a CSVKVInputLayer may parse the key into a line ID, and parse
> the label and feature from the value field. An ImageKVInputLayer may parse a
> SingleLabelImageRecord from the value field.
> 2. The will be a set of layers doing data preprocessing, e.g., normalization
> and image augmentation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)