Have you moved all tensor onto the same devices? Including the tensor for the labels.
> On 9 Oct 2016, at 11:02 AM, Arash Shafiei <[email protected]> wrote: > > outputs = rnn.forward(model_pb2.kTrain, inputs)[0:-2] > grads = [] > batch_loss = 0 > g_dense_w.set_value(0.0) > g_dense_b.set_value(0.0) > print 'outputs len', len(outputs) // 128 > output = outputs[-1] > act = dense.forward(model_pb2.kTrain, output) > print 'output shape', output.shape // (256, 28) > print 'activation shape', act.shape // (256, 6) > print 'labels shape', labels.shape // (256, 6) > lvalue = lossfun.forward(model_pb2.kTrain, act, labels) > batch_loss += lvalue.l1() // [F d1009 t11:00:24 p23551:016 > /home/wuwf/work/incubator-singa/src/core/tensor/./tensor_math_cuda.h:344] > Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) > CUBLAS_STATUS_MAPPING_ERROR > Aborted (core dumped) > > > > >> On Sun, Oct 9, 2016 at 10:55 AM, Wei Wang <[email protected]> wrote: >> Could you please paste the relevant code leading to this error? >> >> >> >>> On 9 Oct 2016, at 10:32 AM, Arash Shafiei <[email protected]> wrote: >>> >>> Thanks, it worked. >>> >>> So far, I managed to do rnn::forward(...) but now I am stuck somewhere else. >>> >>> rnn::forward(...) returns a tensor (denoted as lvalue). I have to obtain >>> the L1 norm using lvalue.l1(). >>> >>> But I get this error: >>> [F d1009 t10:30:14 p23056:-56 >>> /home/wuwf/work/incubator-singa/src/core/tensor/./tensor_math_cuda.h:344] >>> Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) >>> CUBLAS_STATUS_MAPPING_ERROR >>> Aborted (core dumped) >>> >>>> On Sat, Oct 8, 2016 at 9:43 PM, Wang Wei <[email protected]> wrote: >>>> Actually, the char-rnn example is from type (4), where each rnn unit would >>>> generate a prediction and has a ground truth label. >>>> >>>> For your model (type 2), you only need to use the y128 (of shape 256, 28) >>>> from the rnn::forward() as the input to the dense layer. All other yi >>>> should be ignored. >>>> Consequently, you would have an output (denoted as o) of shape (256, 6) >>>> from the dense layer, which is the prediction for the whole sequence (of >>>> length 128). >>>> By feeding the prediction o and the label into the loss layer, you can >>>> compute the loss value and compute the gradient for o (denoted as o'). >>>> Backward propagating the o through the dense layer, you would get the >>>> gradient for y128, denoted as y'128. >>>> >>>> The input of the rnn::backward() would be <y'1, y'2, ...y'128, hy', cy'>, >>>> where only y'128 is a valid tensor. y'1, y'2 ... should be tensor with >>>> value 0. >>>> >>>> Best, >>>> Wei >>>> >>>> >>>>> On Sat, Oct 8, 2016 at 9:33 PM Arash Shafiei <[email protected]> >>>>> wrote: >>>>> Thanks. It worked. >>>>> >>>>> I am now at the phase of evaluating the loss. >>>>> >>>>> singa.loss.SoftmaxCrossEntropy has a forward function where it takes >>>>> prediction tensors and ground truth. >>>>> >>>>> My problem now is that the prediction is a sequence and my label is not a >>>>> sequence. >>>>> >>>>> Your char-rnn example is an application of type (1) in the figure bellow, >>>>> but activity recognition is an application of type (2). >>>>> >>>>> >>>>> <rnn-app.png> >>>>> Therefore for each sequence in a batch I have only 1 label. (although >>>>> this label can be of one dimension from the set of {1,2,3,4,5,6} or of 6 >>>>> dimension from the set of { [1,0,0,0,0,0], [0,1,0,0,0,0] , etc. } >>>>> >>>>> So now I need predictions and ground truth. The prediction for me is of >>>>> shape >>>>> (128, 256, 28) >>>>> where 128 is the length of the sequence, 256 is the batch size and 28 is >>>>> the hidden layer size. >>>>> >>>>> And my ground truth is of shape >>>>> (256, 1) or (256, 6) -- depending on how you model it.. >>>>> >>>>> But as I understood from the example of char-rnn my ground truth must be >>>>> of shape: >>>>> (128, 256) >>>>> >>>>> Would you have any insight about this? >>>>> Thanks.. >>>>> >>>>> >>>>> On Sat, Oct 8, 2016 at 6:42 PM, Wang Wei <[email protected]> wrote: >>>>> Currently, numpy array of dtype=np.float32 or np.int could be converted >>>>> into singa tensor. >>>>> Please convert the numpy array into np.float32 and then call >>>>> tensor.from_numpy(t) (without dtype=np.float32). >>>>> >>>>> On Sat, Oct 8, 2016 at 6:36 PM Arash Shafiei <[email protected]> >>>>> wrote: >>>>> The values that I have are floating points [-1 1]. >>>>> >>>>> While using tensor.from_numpy(...), I was getting this error: >>>>> >>>>> Not implemented yet for float64 >>>>> >>>>> I understood from the tutorial that we could pass the data type: >>>>> y = tensor.from_numpy(..., dtype=np.float32) >>>>> But using dtype, I am getting another error: >>>>> >>>>> TypeError: from_numpy() got an unexpected keyword argument 'dtype' >>>>> >>>>> >>>>> On Sat, Oct 8, 2016 at 3:45 PM, Wang Wei <[email protected]> wrote: >>>>> Hi >>>>> >>>>> According to the API of forward function: >>>>> http://singa.apache.org/en/docs/layer.html#singa.layer.RNN.forward >>>>> The input should be a vector of Tensors, <x1, x2, ... x128, hx, cx>, xi >>>>> is of shape (1500, 9), hx and cx are optional whose shape should be >>>>> (1500, 28). >>>>> The output would be a vector of Tensors, <y1, y2, ..., y128, hy, cy>, yi >>>>> is of shape (1500, 28), hy and cy are optional depending on the existence >>>>> of hx and cx. >>>>> If you want to put the dense layer on top of the last rnn unit (i.e. the >>>>> 128-th), then you feed y128 to the dense layer. >>>>> >>>>> function convert just reshapes the raw data into a sequence of tensors >>>>> <x1, x2, ..>. >>>>> >>>>> BTW, typically, people would use a smaller batchsize e.g. less than 256. >>>>> >>>>> May I forward our discussion to the incubator email list in case others >>>>> have similar problems? >>>>> Thanks. >>>>> >>>>> Best, >>>>> Wei >>>>> >>>>> So here what I have: >>>>> >>>>> input batch of dimension (1500, 128, 9) >>>>> This means a batch of 1500 windows each having 128 vector of 9 dimensions. >>>>> >>>>> input label of dimension (1500, 6) >>>>> This means a label batch of 1500 vector of 6 dimensions. This is to label >>>>> if the person is sitting ([1,0,0,0,0,0]) or standing ([0,1,0,0,0,0]), etc. >>>>> >>>>> I am creating an lstm layer with hidden_size=28 and >>>>> input_sample_shape=(9,) and num_stacks=1 >>>>> >>>>> Then I create a dense layer with num_output=6 and input_sample_shape=(28,) >>>>> >>>>> Now I would like to feed the data to the 'forward' function of lstm and >>>>> dense layer. But I could not make it work and I could not quit understand >>>>> from the example what 'convert' and 'numpy2tensors' are suppose to do... >>>>> >>>>> I would appreciate your comments.. >>>>> >>>>> On Sun, Sep 25, 2016 at 12:23 PM, Arash Shafiei <[email protected]> >>>>> wrote: >>>>> Yes, I was thinking of batch size to be 32. >>>>> >>>>> Thanks. I am getting more how it works and I am thinking how RNN would be >>>>> helpful. Because we do not want to predict a sequence. We just have a >>>>> sequence (in raw data) and a set of features (in processed data) and we >>>>> want to know the classification. >>>>> >>>>> So I was thinking of using other approaches with SINGA. I understood that >>>>> there is also MLP. We could use MLP from SINGA to see the result first. >>>>> >>>>> In this case input would be a set of 561 values with a label. >>>>> Then the MLP, given a set of test data with 561 features would predict >>>>> the label. >>>>> >>>>> Thanks for advices.. >>>>> >>>>> >>>>> >>>>> On Sun, Sep 25, 2016 at 12:03 PM, Wang Wei <[email protected]> >>>>> wrote: >>>>> >>>>> >>>>> On Sun, Sep 25, 2016 at 9:37 AM, Arash Shafiei <[email protected]> >>>>> wrote: >>>>> Hi Wang Wei, >>>>> >>>>> I am trying to understand the char-nn example, but there is still >>>>> something that I am missing and cannot figure is out by myself. >>>>> >>>>> The convert function creates two numpy array x and y. As I understood the >>>>> array x is the data and array y are labels. >>>>> >>>>> I checked the dimentions of these arrays. >>>>> x.shape is (32, 100, 101) >>>>> y.shape is (32, 100) >>>>> >>>>> 32 is the batch size >>>>> 100 is the sequence size >>>>> 101 is the vocabulary size, i.e. there ae 101 unique chars in the >>>>> linux_input.txt. each input from one sample and at one time step is a >>>>> one-hot vector with all positions being 0 except the position of the >>>>> character (set to 1). >>>>> >>>>> >>>>> given a sequence of chars, a,b,c,d,e,f >>>>> if the input (x) is a, b, c, d, e >>>>> then the label is b, c, d, e, f >>>>> >>>>> >>>>> In my understanding you are taking a batch of 100 character and the next >>>>> character must be the label. So according to my understanding >>>>> x.shape must be (32, 100) >>>>> y.shape must be (32, 1) >>>>> >>>>> I mean that you have a batch of 32 sample to train and each sample is a >>>>> series of 100 character. For each sample, there must be a label, which >>>>> says what character must follow this series. And that character is only 1. >>>>> >>>>> Is there anything that I do not quit understand? >>>>> >>>>> I would need this information in order to modify your sample program for >>>>> the activity recognition. >>>>> So ultimately in my use case: >>>>> x.shape probably is (32, 561) >>>>> y.shape probably is (32, 1) >>>>> >>>>> >>>>> For you case, if you use 561 features, then how about the sequence >>>>> length? Is 32 the batchsize? >>>>> 561 are floating point features which is between [-1:1]. >>>>> 1 is the label which is in [1,2,3,4,5,6] >>>>> >>>>> I would appreciate your help. >>>>> Thanks. >>>>> >>>>> On Sat, Sep 24, 2016 at 1:59 PM, Wang Wei <[email protected]> wrote: >>>>> No . Don't average them. >>>>> xij is a a vector of 6 values. You can normalize them using standard >>>>> normalization methods. >>>>> >>>>> On Sat, Sep 24, 2016 at 1:54 PM, Arash Shafiei <[email protected]> >>>>> wrote: >>>>> Thanks for the analysis. I appreciate it. >>>>> >>>>> There is only one thing: >>>>> The activities do not seem to be continuous for a person. It is like >>>>> people are told to walk for a fixed period and 128 sample in R^6 is >>>>> collected. Then people are told to sit, etc. >>>>> >>>>> So the person is not the focus and the focus is one activity. >>>>> >>>>> We are currently working on the first approach you proposed and will see >>>>> result. >>>>> >>>>> Later, we would like to try the second approach. My only concern was that >>>>> xi0, xi1, ... are in R^6 and you propose to concatenate them. Since they >>>>> are floating points I do not know how concatenation would work. Even if >>>>> we average, we would lose lots of information. We will think about it. >>>>> >>>>> Thanks for your advices. >>>>> >>>>> >>>>> On Sat, Sep 24, 2016 at 1:27 PM, Wang Wei <[email protected]> wrote: >>>>> Let's denote xij \in R^6 for the j-th time point of the i-th activity of >>>>> a person, >>>>> let yi \in R561 for the i-th activity of a person. >>>>> >>>>> If the activities of a person are continuous, then you have to approaches >>>>> 1. use y0, y1, y2, .... (all activities of a person) as input, and use >>>>> the labels l0, l1, l2... as the corresponding output of the RNN. The RNN >>>>> needs to output a label for each activity. >>>>> 2. use the raw data, xi0, xi1, xi2.... (all information from a activity) >>>>> as the input, and use the label li as the output of the RNN. The RNN >>>>> needs to output of a label for all time points of one activity. >>>>> >>>>> >>>>> >>>>> On Sat, Sep 24, 2016 at 12:33 PM, Arash Shafiei <[email protected]> >>>>> wrote: >>>>> Yes, in the raw data, for each labeled sample (activity) there are 128 >>>>> time points, each with 6 channels of floating point data. (acc-x, acc-y, >>>>> acc-z, gyro-x, gyro-y, gyro-z) >>>>> >>>>> For each sample (activity) of 128 points of 6 channels, 561 features are >>>>> generated. >>>>> >>>>> Each person performs almost 200 activities. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Sat, Sep 24, 2016 at 12:20 PM, Wang Wei <[email protected]> >>>>> wrote: >>>>> Do you mean that in the dataset, each sample(person) has 128 time points, >>>>> each one with 6 channels? >>>>> If so, I think you can concatenate all 6 channels into a single channel. >>>>> >>>>> On Sat, Sep 24, 2016 at 12:03 PM, Arash Shafiei <[email protected]> >>>>> wrote: >>>>> Hi Wan Wei, >>>>> >>>>> We were wondering if the input of RNN can have multiple channel. >>>>> >>>>> In the example that you have for text prediction, the only channel is the >>>>> characters entering the network. >>>>> >>>>> Now if there are multiple time series, then the network needs multiple >>>>> channels. >>>>> >>>>> For example the raw data coming from accelerometers and gyroscopes are >>>>> compose 6 time series. It means that the data can have 6 dimensions and >>>>> therefore the input of network can have 6 channels. >>>>> >>>>> I verified the data set and it turns out that 561 features are generated >>>>> from 128*6 raw data. So a sequence of samples has 128 values for acc-x, >>>>> acc-y, acc-z, gyro-x, gyro-y, and gyro-z. >>>>> >>>>> As a result the 561 features are not time series anymore. >>>>> >>>>> We are thinking of: >>>>> 1) Use a decision tree of 561 processed feature. >>>>> 2) Use RNN for raw data. >>>>> >>>>> To use RNN for raw data, we would need channels for the input. Would this >>>>> be possible with SINGA? >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> >>> >
