[ 
https://issues.apache.org/jira/browse/SINGA-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946428#comment-14946428
 ] 

ASF subversion and git services commented on SINGA-82:
------------------------------------------------------

Commit 5f010caabd7c09cd9fabee666d93a36377639270 in incubator-singa's branch 
refs/heads/master from [~flytosky]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=5f010ca ]

SINGA-82 Refactor input layers using data store abstraction

* Add StoreLayer to read data from Store, e.g., KVFile, TextFile (will add 
support for HDFS later).
* Implemente subclasses of StoreLayer to parse different format tuples, e.g., 
SingleLabelImageRecord or CSV line.
* Update examples to use the new input layers.
* Add unit tests.
* Add a function for Layer class, which returns a vector<AuxType> for auxiliary 
data (e.g., label).

TODO
1. make AuxType a template argument of Layer class, and extend data() to return 
a vector of Blob for multiple dense features.
2. separate layer classeses into different files to make the structure of the 
source folder clear.


> Refactor input layers using data store abstraction
> --------------------------------------------------
>
>                 Key: SINGA-82
>                 URL: https://issues.apache.org/jira/browse/SINGA-82
>             Project: Singa
>          Issue Type: Improvement
>            Reporter: wangwei
>            Assignee: wangwei
>
> 1. Separate the data storage from Layer. Currently, SINGA creates one layer 
> to read data from one storage, e.g., ShardData, CSV, LMDB. One problem is 
> that only read operations are provided. When users prepare the training data, 
> they have to get familiar with the read/write operations for each storage. 
> Inspired from caffe::db::DB, we can provide a storage  abstraction with 
> simple read/write operation interfaces. Then users call these operations to 
> prepare their training data. Particularly, training data is stored as (string 
> key, string value) tuples. The base Store class 
> {code}
> // open the store for reading, writing or appending
> virtual bool Open(const string& source, Mode mode);
> // for reading tuples
> virtual bool Read(string*key, string*value) = 0;
> // for writing tuples
> virtual bool Write(const string& key, const string& value) = 0;
> {code}
> The specific storage, e.g., CSV, LMDB, image folder or HDFS (will be 
> supported soon), inherits Store and overrides the functions. 
> Consequently, a single KVInputLayer (like the SequenceFile.Reader from 
> Hadoop) can read from different sources by configuring *store* field (e.g., 
> store=csv). 
> With the Store class, we can implement a KVInputLayer to read batchsize 
> tuples in its ComputeFeature function. The tuple is parsed by a virtual 
> function depending on the application (or the format of the tuple). 
> {code}
> // parse the tuple as the k-th instance for one mini-batch
> virtual bool Parse(int k, const string& key, const string& tuple) = 0;
> {code}
> For example, a CSVKVInputLayer may parse the key into a line ID, and parse 
> the label and feature from the value field. An ImageKVInputLayer may parse a 
> SingleLabelImageRecord from the value field.
> 2. The will be a set of layers doing data preprocessing, e.g., normalization 
> and image augmentation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to