[
https://issues.apache.org/jira/browse/SINGA-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946430#comment-14946430
]
ASF subversion and git services commented on SINGA-82:
------------------------------------------------------
Commit dc7f1996df26687612f61945bbb58ccbe0db65f4 in incubator-singa's branch
refs/heads/master from wang sheng
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=dc7f199 ]
SINGA-82 Refactor input layers using data store abstraction
add header guard for textfile_store.h, image_transform.h
format code
> Refactor input layers using data store abstraction
> --------------------------------------------------
>
> Key: SINGA-82
> URL: https://issues.apache.org/jira/browse/SINGA-82
> Project: Singa
> Issue Type: Improvement
> Reporter: wangwei
> Assignee: wangwei
>
> 1. Separate the data storage from Layer. Currently, SINGA creates one layer
> to read data from one storage, e.g., ShardData, CSV, LMDB. One problem is
> that only read operations are provided. When users prepare the training data,
> they have to get familiar with the read/write operations for each storage.
> Inspired from caffe::db::DB, we can provide a storage abstraction with
> simple read/write operation interfaces. Then users call these operations to
> prepare their training data. Particularly, training data is stored as (string
> key, string value) tuples. The base Store class
> {code}
> // open the store for reading, writing or appending
> virtual bool Open(const string& source, Mode mode);
> // for reading tuples
> virtual bool Read(string*key, string*value) = 0;
> // for writing tuples
> virtual bool Write(const string& key, const string& value) = 0;
> {code}
> The specific storage, e.g., CSV, LMDB, image folder or HDFS (will be
> supported soon), inherits Store and overrides the functions.
> Consequently, a single KVInputLayer (like the SequenceFile.Reader from
> Hadoop) can read from different sources by configuring *store* field (e.g.,
> store=csv).
> With the Store class, we can implement a KVInputLayer to read batchsize
> tuples in its ComputeFeature function. The tuple is parsed by a virtual
> function depending on the application (or the format of the tuple).
> {code}
> // parse the tuple as the k-th instance for one mini-batch
> virtual bool Parse(int k, const string& key, const string& tuple) = 0;
> {code}
> For example, a CSVKVInputLayer may parse the key into a line ID, and parse
> the label and feature from the value field. An ImageKVInputLayer may parse a
> SingleLabelImageRecord from the value field.
> 2. The will be a set of layers doing data preprocessing, e.g., normalization
> and image augmentation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)