Re: activity recognition using apache singa

Wei Wang Sat, 08 Oct 2016 20:30:07 -0700

Have you moved all tensor onto the same devices? Including the tensor for the 
labels.



> On 9 Oct 2016, at 11:02 AM, Arash Shafiei <[email protected]> wrote:
> 
> outputs = rnn.forward(model_pb2.kTrain, inputs)[0:-2]
> grads = []
> batch_loss = 0
> g_dense_w.set_value(0.0)
> g_dense_b.set_value(0.0)
> print 'outputs len', len(outputs) // 128
> output = outputs[-1]
> act = dense.forward(model_pb2.kTrain, output)
> print 'output shape', output.shape // (256, 28)
> print 'activation shape', act.shape // (256, 6)
> print 'labels shape', labels.shape // (256, 6)
> lvalue = lossfun.forward(model_pb2.kTrain, act, labels)
> batch_loss += lvalue.l1() // [F d1009 t11:00:24 p23551:016 
> /home/wuwf/work/incubator-singa/src/core/tensor/./tensor_math_cuda.h:344] 
> Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) 
> CUBLAS_STATUS_MAPPING_ERROR
> Aborted (core dumped)
> 
> 
> 
> 
>> On Sun, Oct 9, 2016 at 10:55 AM, Wei Wang <[email protected]> wrote:
>> Could you please paste the relevant code leading to this error?
>> 
>> 
>> 
>>> On 9 Oct 2016, at 10:32 AM, Arash Shafiei <[email protected]> wrote:
>>> 
>>> Thanks, it worked.
>>> 
>>> So far, I managed to do rnn::forward(...) but now I am stuck somewhere else.
>>> 
>>> rnn::forward(...) returns a tensor (denoted as lvalue). I have to obtain 
>>> the L1 norm using lvalue.l1().
>>> 
>>> But I get this error:
>>> [F d1009 t10:30:14 p23056:-56 
>>> /home/wuwf/work/incubator-singa/src/core/tensor/./tensor_math_cuda.h:344] 
>>> Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) 
>>> CUBLAS_STATUS_MAPPING_ERROR
>>> Aborted (core dumped)
>>> 
>>>> On Sat, Oct 8, 2016 at 9:43 PM, Wang Wei <[email protected]> wrote:
>>>> Actually, the char-rnn example is from type (4), where each rnn unit would 
>>>> generate a prediction and has a ground truth label.
>>>> 
>>>> For your model (type 2), you only need to use the y128 (of shape 256, 28) 
>>>> from the rnn::forward() as the input to the dense layer. All other yi 
>>>> should be ignored.
>>>> Consequently, you would have an output (denoted as o) of shape (256, 6) 
>>>> from the dense layer, which is the prediction for the whole sequence (of 
>>>> length 128).
>>>> By feeding the prediction o and the label into the loss layer, you can 
>>>> compute the loss value and compute the gradient for o (denoted as o').
>>>> Backward propagating the o through the dense layer, you would get the 
>>>> gradient for y128, denoted as y'128.
>>>> 
>>>> The input of the rnn::backward() would be <y'1, y'2, ...y'128, hy', cy'>, 
>>>> where only y'128 is a valid tensor. y'1, y'2 ... should be tensor with 
>>>> value 0.
>>>> 
>>>> Best,
>>>> Wei
>>>> 
>>>> 
>>>>> On Sat, Oct 8, 2016 at 9:33 PM Arash Shafiei <[email protected]> 
>>>>> wrote:
>>>>> Thanks. It worked.
>>>>> 
>>>>> I am now at the phase of evaluating the loss.
>>>>> 
>>>>> singa.loss.SoftmaxCrossEntropy has a forward function where it takes 
>>>>> prediction tensors and ground truth.
>>>>> 
>>>>> My problem now is that the prediction is a sequence and my label is not a 
>>>>> sequence.
>>>>> 
>>>>> Your char-rnn example is an application of type (1) in the figure bellow, 
>>>>> but activity recognition is an application of type (2).
>>>>> 
>>>>> 
>>>>> <rnn-app.png>
>>>>> Therefore for each sequence in a batch I have only 1 label. (although 
>>>>> this label can be of one dimension from the set of {1,2,3,4,5,6} or of 6 
>>>>> dimension from the set of { [1,0,0,0,0,0], [0,1,0,0,0,0] , etc. }
>>>>> 
>>>>> So now I need predictions and ground truth. The prediction for me is of 
>>>>> shape
>>>>> (128, 256, 28)
>>>>> where 128 is the length of the sequence, 256 is the batch size and 28 is 
>>>>> the hidden layer size.
>>>>> 
>>>>> And my ground truth is of shape
>>>>> (256, 1) or (256, 6) -- depending on how you model it..
>>>>> 
>>>>> But as I understood from the example of char-rnn my ground truth must be 
>>>>> of shape:
>>>>> (128, 256)
>>>>> 
>>>>> Would you have any insight about this?
>>>>> Thanks..
>>>>> 
>>>>> 
>>>>> On Sat, Oct 8, 2016 at 6:42 PM, Wang Wei <[email protected]> wrote:
>>>>> Currently, numpy array of dtype=np.float32 or np.int could be converted 
>>>>> into singa tensor.
>>>>> Please convert the numpy array into np.float32 and then call 
>>>>> tensor.from_numpy(t) (without dtype=np.float32).
>>>>> 
>>>>> On Sat, Oct 8, 2016 at 6:36 PM Arash Shafiei <[email protected]> 
>>>>> wrote:
>>>>> The values that I have are floating points [-1 1].
>>>>> 
>>>>> While using tensor.from_numpy(...), I was getting this error:
>>>>> 
>>>>> Not implemented yet for  float64
>>>>> 
>>>>> I understood from the tutorial that we could pass the data type:
>>>>> y = tensor.from_numpy(..., dtype=np.float32)
>>>>> But using dtype, I am getting another error:
>>>>> 
>>>>> TypeError: from_numpy() got an unexpected keyword argument 'dtype'
>>>>> 
>>>>> 
>>>>> On Sat, Oct 8, 2016 at 3:45 PM, Wang Wei <[email protected]> wrote:
>>>>> Hi 
>>>>> 
>>>>> According to the API of forward function: 
>>>>> http://singa.apache.org/en/docs/layer.html#singa.layer.RNN.forward
>>>>> The input should be a vector of Tensors, <x1, x2, ... x128, hx, cx>, xi 
>>>>> is of shape (1500, 9), hx and cx are optional whose shape should be 
>>>>> (1500, 28).
>>>>> The output would be a vector of Tensors, <y1, y2, ..., y128, hy, cy>, yi 
>>>>> is of shape (1500, 28), hy and cy are optional depending on the existence 
>>>>> of hx and cx.
>>>>> If you want to put the dense layer on top of the last rnn unit (i.e. the 
>>>>> 128-th), then you feed y128 to the dense layer.
>>>>> 
>>>>> function convert just reshapes the raw data into a sequence of tensors 
>>>>> <x1, x2, ..>.
>>>>> 
>>>>> BTW, typically, people would use a smaller batchsize e.g. less than 256.
>>>>> 
>>>>> May I forward our discussion to the incubator email list in case others 
>>>>> have similar problems? 
>>>>> Thanks.
>>>>> 
>>>>> Best,
>>>>> Wei
>>>>> 
>>>>> So here what I have:
>>>>> 
>>>>> input batch of dimension (1500, 128, 9)
>>>>> This means a batch of 1500 windows each having 128 vector of 9 dimensions.
>>>>> 
>>>>> input label of dimension (1500, 6) 
>>>>> This means a label batch of 1500 vector of 6 dimensions. This is to label 
>>>>> if the person is sitting ([1,0,0,0,0,0]) or standing ([0,1,0,0,0,0]), etc.
>>>>> 
>>>>> I am creating an lstm layer with hidden_size=28 and 
>>>>> input_sample_shape=(9,) and num_stacks=1
>>>>> 
>>>>> Then I create a dense layer with num_output=6 and input_sample_shape=(28,)
>>>>> 
>>>>> Now I would like to feed the data to the 'forward' function of lstm and 
>>>>> dense layer. But I could not make it work and I could not quit understand 
>>>>> from the example what 'convert' and 'numpy2tensors' are suppose to do...
>>>>> 
>>>>> I would appreciate your comments..
>>>>> 
>>>>> On Sun, Sep 25, 2016 at 12:23 PM, Arash Shafiei <[email protected]> 
>>>>> wrote:
>>>>> Yes, I was thinking of batch size to be 32.
>>>>> 
>>>>> Thanks. I am getting more how it works and I am thinking how RNN would be 
>>>>> helpful. Because we do not want to predict a sequence. We just have a 
>>>>> sequence (in raw data) and a set of features (in processed data) and we 
>>>>> want to know the classification.
>>>>> 
>>>>> So I was thinking of using other approaches with SINGA. I understood that 
>>>>> there is also MLP. We could use MLP from SINGA to see the result first.
>>>>> 
>>>>> In this case input would be a set of 561 values with a label.
>>>>> Then the MLP, given a set of test data with 561 features would predict 
>>>>> the label.
>>>>> 
>>>>> Thanks for advices..
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sun, Sep 25, 2016 at 12:03 PM, Wang Wei <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>> 
>>>>> On Sun, Sep 25, 2016 at 9:37 AM, Arash Shafiei <[email protected]> 
>>>>> wrote:
>>>>> Hi Wang Wei,
>>>>> 
>>>>> I am trying to understand the char-nn example, but there is still 
>>>>> something that I am missing and cannot figure is out by myself.
>>>>> 
>>>>> The convert function creates two numpy array x and y. As I understood the 
>>>>> array x is the data and array y are labels.
>>>>> 
>>>>> I checked the dimentions of these arrays. 
>>>>> x.shape is (32, 100, 101)
>>>>> y.shape is (32, 100)
>>>>> 
>>>>> 32 is the batch size
>>>>> 100 is the sequence size
>>>>> 101 is the vocabulary size, i.e. there ae 101 unique chars in the 
>>>>> linux_input.txt.  each input from one sample and at one time step is a 
>>>>> one-hot vector with all positions being 0 except the position of the 
>>>>> character (set to 1).
>>>>> 
>>>>> 
>>>>> given a sequence of chars,   a,b,c,d,e,f
>>>>> if the input (x) is  a, b, c, d, e
>>>>> then the label is  b, c, d, e, f
>>>>> 
>>>>>  
>>>>> In my understanding you are taking a batch of 100 character and the next 
>>>>> character must be the label. So according to my understanding
>>>>> x.shape must be (32, 100)
>>>>> y.shape must be (32, 1)
>>>>> 
>>>>> I mean that you have a batch of 32 sample to train and each sample is a 
>>>>> series of 100 character. For each sample, there must be a label, which 
>>>>> says what character must follow this series. And that character is only 1.
>>>>> 
>>>>> Is there anything that I do not quit understand?
>>>>> 
>>>>> I would need this information in order to modify your sample program for 
>>>>> the activity recognition.
>>>>> So ultimately in my use case:
>>>>> x.shape probably is (32, 561)
>>>>> y.shape probably is (32, 1) 
>>>>> 
>>>>> 
>>>>> For you case, if you use 561 features, then how about the sequence 
>>>>> length? Is 32 the batchsize? 
>>>>> 561 are floating point features which is between [-1:1].
>>>>> 1 is the label which is in [1,2,3,4,5,6]
>>>>> 
>>>>> I would appreciate your help.
>>>>> Thanks.
>>>>> 
>>>>> On Sat, Sep 24, 2016 at 1:59 PM, Wang Wei <[email protected]> wrote:
>>>>> No . Don't average them.
>>>>> xij is a a vector of 6 values. You can normalize them using standard 
>>>>> normalization methods.
>>>>> 
>>>>> On Sat, Sep 24, 2016 at 1:54 PM, Arash Shafiei <[email protected]> 
>>>>> wrote:
>>>>> Thanks for the analysis. I appreciate it.
>>>>> 
>>>>> There is only one thing:
>>>>> The activities do not seem to be continuous for a person. It is like 
>>>>> people are told to walk for a fixed period and 128 sample in R^6 is 
>>>>> collected. Then people are told to sit, etc.
>>>>> 
>>>>> So the person is not the focus and the focus is one activity.
>>>>> 
>>>>> We are currently working on the first approach you proposed and will see 
>>>>> result.
>>>>> 
>>>>> Later, we would like to try the second approach. My only concern was that 
>>>>> xi0, xi1, ... are in R^6 and you propose to concatenate them. Since they 
>>>>> are floating points I do not know how concatenation would work. Even if 
>>>>> we average, we would lose lots of information. We will think about it.
>>>>> 
>>>>> Thanks for your advices.
>>>>> 
>>>>> 
>>>>> On Sat, Sep 24, 2016 at 1:27 PM, Wang Wei <[email protected]> wrote:
>>>>> Let's denote xij \in R^6 for the j-th time point of the i-th activity of 
>>>>> a person,
>>>>> let yi \in R561 for the i-th activity of a person.
>>>>> 
>>>>> If the activities of a person are continuous, then you have to approaches
>>>>> 1. use y0, y1, y2, .... (all activities of a person) as input, and use 
>>>>> the labels l0, l1, l2... as the corresponding output of the RNN. The RNN 
>>>>> needs to output a label for each activity.
>>>>> 2. use the raw data, xi0, xi1, xi2.... (all information from a activity) 
>>>>> as the input, and use the label li as the output of the RNN. The RNN 
>>>>> needs to output of a label for all time points of one activity.
>>>>> 
>>>>>  
>>>>> 
>>>>> On Sat, Sep 24, 2016 at 12:33 PM, Arash Shafiei <[email protected]> 
>>>>> wrote:
>>>>> Yes, in the raw data, for each labeled sample (activity) there are 128 
>>>>> time points, each with 6 channels of floating point data. (acc-x, acc-y, 
>>>>> acc-z, gyro-x, gyro-y, gyro-z)
>>>>> 
>>>>> For each sample (activity) of 128 points of 6 channels, 561 features are 
>>>>> generated.
>>>>> 
>>>>> Each person performs almost 200 activities.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sat, Sep 24, 2016 at 12:20 PM, Wang Wei <[email protected]> 
>>>>> wrote:
>>>>> Do you mean that in the dataset, each sample(person) has 128 time points, 
>>>>> each one with 6 channels?
>>>>> If so, I think you can concatenate all 6 channels into a single channel.
>>>>> 
>>>>> On Sat, Sep 24, 2016 at 12:03 PM, Arash Shafiei <[email protected]> 
>>>>> wrote:
>>>>> Hi Wan Wei,
>>>>> 
>>>>> We were wondering if the input of RNN can have multiple channel.
>>>>> 
>>>>> In the example that you have for text prediction, the only channel is the 
>>>>> characters entering the network.
>>>>> 
>>>>> Now if there are multiple time series, then the network needs multiple 
>>>>> channels.
>>>>> 
>>>>> For example the raw data coming from accelerometers and gyroscopes are 
>>>>> compose 6 time series. It means that the data can have 6 dimensions and 
>>>>> therefore the input of network can have 6 channels.
>>>>> 
>>>>> I verified the data set and it turns out that 561 features are generated 
>>>>> from 128*6 raw data. So a sequence of samples has 128 values for acc-x, 
>>>>> acc-y, acc-z, gyro-x, gyro-y, and gyro-z.
>>>>> 
>>>>> As a result the 561 features are not time series anymore. 
>>>>> 
>>>>> We are thinking of:
>>>>> 1) Use a decision tree of 561 processed feature.
>>>>> 2) Use RNN for raw data.
>>>>> 
>>>>> To use RNN for raw data, we would need channels for the input. Would this 
>>>>> be possible with SINGA?
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> 
>>>>> 
>>> 
>

Re: activity recognition using apache singa

Reply via email to