jimmylao commented on issue #49:
URL: 
https://github.com/apache/incubator-bluemarlin/issues/49#issuecomment-1048165884


   @klonikar 
   
   1. What is context in the red cells in decoder? Is it used as input to next 
timestep?
   As shown in the 1st figure, the context of the red cells are from the 
attention layer which will be concatenated with the state output of current 
cell, then used as input to next cell.
   
   2. Can you explain split of train_window data into train_skip_first, data 
used for training and final eval? How is the data in the final 10 days of 82 
(12+60+10) days used in training? Is it used to update regularization params or 
weights in backprop?
   
![image](https://user-images.githubusercontent.com/60371672/155204831-02ce537a-e723-4dec-a806-052c7e78a4d8.png)
   As shown in the figure above, the data is typical split into configurable 4 
parts, i.e. train_skip_first, train_window, validate and back_offset. In your 
case,
   
   - train_skip_first = 12
   - train_window = 60
   - predict (validate) = 10
   - back_offiset = 0
   
   "Is it used to update regularization params or weights in backprop?"  - I 
don't understand the meaning of this question. The data is used for training a 
model or leverage trained model for prediction, in training stage, they are 
used to compute gradient in each batch/epoch which can further be used in 
backprop.
   
   3. What are the features used at time step of 82 days? How is ts_n in 
trainready table used as input? I thought that's the output.
   
![image](https://user-images.githubusercontent.com/60371672/155207931-bec7a5fa-e3a1-495c-8211-af1b9696558b.png)
   As shown in the figure above, time_x is the major input data needed to train 
the model, it has multiple components. trainready table is the output of 
preprocessing and input for model training. There's a program called 
tfrecord_reader.py to convert trainready data format to model training data 
format.
   
   4. Attention: Explain attention heads = 10, and attention depth = 60
   These parameters are used when the "attention" option is turned on. The 
purpose of attention is to find the correlation among different time step. The 
attention method used by this model used the attention concept not the 
implementation, i.e. it uses convolution to model the correlation among 
different time steps which is faster than typical attention implementation, 
however, such approximation does not generate better performance. Domain 
knowledge based attention features, such as: monthly or quarterly correlation, 
are added in the model to fill the gap and was claimed to work better than 
using the attention implementation in the model.
   
   5. Can you give mathematical description of the model?
   I'll add some math equation later. I suggest you read through the code and 
run a couple of sample data first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@bluemarlin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to