[
https://issues.apache.org/jira/browse/SINGA-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946428#comment-14946428
]
ASF subversion and git services commented on SINGA-82:
------------------------------------------------------
Commit 5f010caabd7c09cd9fabee666d93a36377639270 in incubator-singa's branch
refs/heads/master from [~flytosky]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=5f010ca ]
SINGA-82 Refactor input layers using data store abstraction
* Add StoreLayer to read data from Store, e.g., KVFile, TextFile (will add
support for HDFS later).
* Implemente subclasses of StoreLayer to parse different format tuples, e.g.,
SingleLabelImageRecord or CSV line.
* Update examples to use the new input layers.
* Add unit tests.
* Add a function for Layer class, which returns a vector<AuxType> for auxiliary
data (e.g., label).
TODO
1. make AuxType a template argument of Layer class, and extend data() to return
a vector of Blob for multiple dense features.
2. separate layer classeses into different files to make the structure of the
source folder clear.
> Refactor input layers using data store abstraction
> --------------------------------------------------
>
> Key: SINGA-82
> URL: https://issues.apache.org/jira/browse/SINGA-82
> Project: Singa
> Issue Type: Improvement
> Reporter: wangwei
> Assignee: wangwei
>
> 1. Separate the data storage from Layer. Currently, SINGA creates one layer
> to read data from one storage, e.g., ShardData, CSV, LMDB. One problem is
> that only read operations are provided. When users prepare the training data,
> they have to get familiar with the read/write operations for each storage.
> Inspired from caffe::db::DB, we can provide a storage abstraction with
> simple read/write operation interfaces. Then users call these operations to
> prepare their training data. Particularly, training data is stored as (string
> key, string value) tuples. The base Store class
> {code}
> // open the store for reading, writing or appending
> virtual bool Open(const string& source, Mode mode);
> // for reading tuples
> virtual bool Read(string*key, string*value) = 0;
> // for writing tuples
> virtual bool Write(const string& key, const string& value) = 0;
> {code}
> The specific storage, e.g., CSV, LMDB, image folder or HDFS (will be
> supported soon), inherits Store and overrides the functions.
> Consequently, a single KVInputLayer (like the SequenceFile.Reader from
> Hadoop) can read from different sources by configuring *store* field (e.g.,
> store=csv).
> With the Store class, we can implement a KVInputLayer to read batchsize
> tuples in its ComputeFeature function. The tuple is parsed by a virtual
> function depending on the application (or the format of the tuple).
> {code}
> // parse the tuple as the k-th instance for one mini-batch
> virtual bool Parse(int k, const string& key, const string& tuple) = 0;
> {code}
> For example, a CSVKVInputLayer may parse the key into a line ID, and parse
> the label and feature from the value field. An ImageKVInputLayer may parse a
> SingleLabelImageRecord from the value field.
> 2. The will be a set of layers doing data preprocessing, e.g., normalization
> and image augmentation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)