[jira] [Commented] (SINGA-82) Refactor input layers using data store abstraction

ASF subversion and git services (JIRA) Wed, 07 Oct 2015 00:26:45 -0700

    [ 
https://issues.apache.org/jira/browse/SINGA-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946427#comment-14946427
 ]


ASF subversion and git services commented on SINGA-82:
------------------------------------------------------

Commit d99b24cb75def9fdbdc59273c4297abb75813c36 in incubator-singa's branch 
refs/heads/master from [~flytosky]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=d99b24c ]

SINGA-82 Refactor input layers using data store abstraction

Add Store abstraction for read (writing data). Implemented two backend,

1. KVFile, which was named DataShard. It is a binary file, each tuple
has a unique key.
2. TextFile, which is a plain text file with each line be the value
field of a tuple (the key is the line No.).

TODO, implment HDFS and image folder as the backend.


> Refactor input layers using data store abstraction
> --------------------------------------------------
>
>                 Key: SINGA-82
>                 URL: https://issues.apache.org/jira/browse/SINGA-82
>             Project: Singa
>          Issue Type: Improvement
>            Reporter: wangwei
>            Assignee: wangwei
>
> 1. Separate the data storage from Layer. Currently, SINGA creates one layer 
> to read data from one storage, e.g., ShardData, CSV, LMDB. One problem is 
> that only read operations are provided. When users prepare the training data, 
> they have to get familiar with the read/write operations for each storage. 
> Inspired from caffe::db::DB, we can provide a storage  abstraction with 
> simple read/write operation interfaces. Then users call these operations to 
> prepare their training data. Particularly, training data is stored as (string 
> key, string value) tuples. The base Store class 
> {code}
> // open the store for reading, writing or appending
> virtual bool Open(const string& source, Mode mode);
> // for reading tuples
> virtual bool Read(string*key, string*value) = 0;
> // for writing tuples
> virtual bool Write(const string& key, const string& value) = 0;
> {code}
> The specific storage, e.g., CSV, LMDB, image folder or HDFS (will be 
> supported soon), inherits Store and overrides the functions. 
> Consequently, a single KVInputLayer (like the SequenceFile.Reader from 
> Hadoop) can read from different sources by configuring *store* field (e.g., 
> store=csv). 
> With the Store class, we can implement a KVInputLayer to read batchsize 
> tuples in its ComputeFeature function. The tuple is parsed by a virtual 
> function depending on the application (or the format of the tuple). 
> {code}
> // parse the tuple as the k-th instance for one mini-batch
> virtual bool Parse(int k, const string& key, const string& tuple) = 0;
> {code}
> For example, a CSVKVInputLayer may parse the key into a line ID, and parse 
> the label and feature from the value field. An ImageKVInputLayer may parse a 
> SingleLabelImageRecord from the value field.
> 2. The will be a set of layers doing data preprocessing, e.g., normalization 
> and image augmentation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SINGA-82) Refactor input layers using data store abstraction

Reply via email to