[jira] [Updated] (SINGA-506) add autograd operators for NLP models

zhangzhaoqi (Jira) Wed, 26 Feb 2020 04:18:45 -0800


     [ 
https://issues.apache.org/jira/browse/SINGA-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


zhangzhaoqi updated SINGA-506:
------------------------------
    Description: 
*We are going to support these three NLP models, called, Bidirectional 
Attention Flow, BERT-Squad and GPT-2.*

*Totally, there are still 19 operators that we need to add as following,*
 
*For details, these 19 operators belong to these three models separately:*
|{color:#000000}*Operator*{color}|{color:#000000}*Rank*{color}|{color:#000000}*Workload*{color}|{color:#000000}*Comments*{color}|{color:#000000}*Bidirectional
 Attention 
Flow*{color}|{color:#000000}*BERT-Squad*{color}|{color:#000000}*GPT-2*{color}|
|{color:#000000}*Transpose*{color}|{color:#000000}easy{color}|{color:#000000}1h{color}|{color:#000000}Transpose
 the input tensor similar to numpy.transpose. 
{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*ConstantOfShape*{color}|{color:#000000}easy{color}|{color:#000000}2h{color}|{color:#000000}Generate
 a tensor with given value and shape.{color}|{color:#000000}T{color}| 
|{color:#000000}T{color}|
|{color:#000000}*Shape*{color}|{color:#000000}easy{color}|{color:#000000}2h{color}|{color:#000000}Takes
 a tensor as input and outputs an 1D int64 tensor containing the shape of the 
input 
tensor.{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*Dropout*{color}|{color:#000000}easy{color}|{color:#000000}3h{color}|{color:#000000}Dropout
 takes an input floating-point tensor and an input ratio (floating-point 
scalar), and produces two tensor outputs, output (floating-point tensor) and 
mask (Tensor<bool>). {color}|{color:#000000}T{color}| | |
|{color:#000000}*Ceil*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}y
 = ceil(x){color}|{color:#000000}T{color}| | |
|{color:#000000}*ReduceMax*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes
 the max of the input tensor's element along the provided axes. 
{color}|{color:#000000}T{color}| | |
|{color:#000000}*ReduceMean*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes
 the mean of the input tensor's element along the provided axes.{color}| 
|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*ReduceSum*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes
 the sum of the input tensor's element along the provided 
axes.{color}|{color:#000000}T{color}| | |
|{color:#000000}*Slice*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Produces
 a slice of the input tensor along multiple axes. 
{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*Compress*{color}|{color:#000000}easy{color}|{color:#000000}6h{color}|{color:#000000}Selects
 slices from an input tensor along a given axis where condition evaluates to 
True for each axis index.{color}|{color:#000000}T{color}| | |
|{color:#000000}*Hardmax*{color}|{color:#000000}easy{color}|{color:#000000}6h{color}|{color:#000000}The
 operator computes the hardmax (1 for the first maximum value, and 0 for all 
others) values for each layer in the batch of the given 
input.{color}|{color:#000000}T{color}| | |
|{color:#000000}*NonZero*{color}|{color:#000000}easy{color}|{color:#000000}12h{color}|{color:#000000}Returns
 the indices of the elements that are non-zero (in row-major order - by 
dimension).{color}| | |{color:#000000}T{color}|
|{color:#000000}*Split*{color}|{color:#000000}easy{color}|{color:#000000}12h{color}|{color:#000000}Split
 a tensor into a list of tensors, along the specified 'axis'.{color}| 
|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*Tile*{color}|{color:#000000}easy{color}|{color:#000000}1d{color}|{color:#000000}Constructs
 a tensor by tiling a given tensor. This is the same as function tile in Numpy, 
but no broadcast. For example A = [[1, 2], [3, 4]], B = [1, 2], tile(A, B) = 
[[1, 2, 1, 2], [3, 4, 3, 4]]{color}| |{color:#000000}T{color}| |
|{color:#000000}*ArgMax*{color}|{color:#000000}complicated{color}|{color:#000000}2d{color}|{color:#000000}Computes
 the indices of the max elements of the input tensor's element along the 
provided axis. {color}|{color:#000000}T{color}| | |
|{color:#000000}*Gather*{color}|{color:#000000}complicated{color}|{color:#000000}3d{color}|{color:#000000}Given
 data tensor of rank r >= 1, and indices tensor of rank q, gather entries of 
the axis dimension of data (by default outer-most one as axis=0) indexed by 
indices, and concatenates 
them{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*Scan*{color}|{color:#000000}hard{color}|{color:#000000}2w{color}|{color:#000000}Scan
 can be used to iterate over one or more scan_input tensors, constructing zero 
or more scan_output tensors. It combines ideas from general recurrences, 
functional programming constructs such as scan, fold, map, and zip and is 
intended to enable generalizations of RNN-like constructs for 
sequence-to-sequence processing.{color}|{color:#000000}T{color}| | |
|{color:#000000}*Cast*{color}|{color:#000000}hard{color}|{color:#000000}-{color}|{color:#000000}The
 operator casts the elements of a given input tensor to a data type specified 
by the 'to' argument and returns an output tensor of the same size in the 
converted 
type.{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*CategoryMapper*{color}| 
|{color:#000000}-{color}|{color:#000000}not in onnx 
document{color}|{color:#000000}T{color}| | |

*Bidirectional Attention Flow:*
 ArgMax
 Cast
 CategoryMapper
 Ceil
 Compress
 ConstantOfShape
 Dropout
 Gather
 Hardmax
 ReduceMax
 ReduceSum
 Scan
 Shape
 Slice
 Transpose

*BERT-Squad:*
 Slice
 Shape
 Gather
 ReduceMean
 Cast
 Tile
 Transpose
 Split

*GPT-2:*
 ConstantOfShape
 Slice
 Shape
 Gather
 ReduceMean
 NonZero
 Cast
 Transpose
 Split

 

  was:
*We are going to support these three NLP models, called, Bidirectional 
Attention Flow, BERT-Squad and GPT-2.*

*Totally, there are still 19 operators that we need to add as following,*
|{color:#000000}*Operator*{color}|{color:#000000}*Rank*{color}|{color:#000000}*Workload*{color}|{color:#000000}*Comments*{color}|{color:#000000}*Bidirectional
 Attention 
Flow*{color}|{color:#000000}*BERT-Squad*{color}|{color:#000000}*GPT-2*{color}|
|{color:#000000}*Transpose*{color}|{color:#000000}easy{color}|{color:#000000}1h{color}|{color:#000000}Transpose
 the input tensor similar to numpy.transpose. 
{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*ConstantOfShape*{color}|{color:#000000}easy{color}|{color:#000000}2h{color}|{color:#000000}Generate
 a tensor with given value and shape.{color}|{color:#000000}T{color}| 
|{color:#000000}T{color}|
|{color:#000000}*ReduceMax*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes
 the max of the input tensor's element along the provided axes. 
{color}|{color:#000000}T{color}| | |
|{color:#000000}*ReduceMean*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes
 the mean of the input tensor's element along the provided axes.{color}| 
|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*ReduceSum*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes
 the sum of the input tensor's element along the provided 
axes.{color}|{color:#000000}T{color}| | |
|{color:#000000}*Shape*{color}|{color:#000000}easy{color}|{color:#000000}2h{color}|{color:#000000}Takes
 a tensor as input and outputs an 1D int64 tensor containing the shape of the 
input 
tensor.{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*Slice*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Produces
 a slice of the input tensor along multiple axes. 
{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*Dropout*{color}|{color:#000000}easy{color}|{color:#000000}3h{color}|{color:#000000}Dropout
 takes an input floating-point tensor and an input ratio (floating-point 
scalar), and produces two tensor outputs, output (floating-point tensor) and 
mask (Tensor<bool>). {color}|{color:#000000}T{color}| | |
|{color:#000000}*Hardmax*{color}|{color:#000000}easy{color}|{color:#000000}6h{color}|{color:#000000}The
 operator computes the hardmax (1 for the first maximum value, and 0 for all 
others) values for each layer in the batch of the given 
input.{color}|{color:#000000}T{color}| | |
|{color:#000000}*NonZero*{color}|{color:#000000}easy{color}|{color:#000000}12h{color}|{color:#000000}Returns
 the indices of the elements that are non-zero (in row-major order - by 
dimension).{color}| | |{color:#000000}T{color}|
|{color:#000000}*Split*{color}|{color:#000000}easy{color}|{color:#000000}12h{color}|{color:#000000}Split
 a tensor into a list of tensors, along the specified 'axis'.{color}| 
|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*Tile*{color}|{color:#000000}easy{color}|{color:#000000}1d{color}|{color:#000000}Constructs
 a tensor by tiling a given tensor. This is the same as function tile in Numpy, 
but no broadcast. For example A = [[1, 2], [3, 4]], B = [1, 2], tile(A, B) = 
[[1, 2, 1, 2], [3, 4, 3, 4]]{color}| |{color:#000000}T{color}| |
|{color:#000000}*Ceil*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}y
 = ceil(x){color}|{color:#000000}T{color}| | |
|{color:#000000}*Compress*{color}|{color:#000000}easy{color}|{color:#000000}6h{color}|{color:#000000}Selects
 slices from an input tensor along a given axis where condition evaluates to 
True for each axis index.{color}|{color:#000000}T{color}| | |
|{color:#000000}*Gather*{color}|{color:#000000}complicated{color}|{color:#000000}3d{color}|{color:#000000}Given
 data tensor of rank r >= 1, and indices tensor of rank q, gather entries of 
the axis dimension of data (by default outer-most one as axis=0) indexed by 
indices, and concatenates 
them{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*ArgMax*{color}|{color:#000000}complicated{color}|{color:#000000}2d{color}|{color:#000000}Computes
 the indices of the max elements of the input tensor's element along the 
provided axis. {color}|{color:#000000}T{color}| | |
|{color:#000000}*Cast*{color}|{color:#000000}hard{color}|{color:#000000}-{color}|{color:#000000}The
 operator casts the elements of a given input tensor to a data type specified 
by the 'to' argument and returns an output tensor of the same size in the 
converted 
type.{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*Scan*{color}|{color:#000000}hard{color}|{color:#000000}2w{color}|{color:#000000}Scan
 can be used to iterate over one or more scan_input tensors, constructing zero 
or more scan_output tensors. It combines ideas from general recurrences, 
functional programming constructs such as scan, fold, map, and zip and is 
intended to enable generalizations of RNN-like constructs for 
sequence-to-sequence processing.{color}|{color:#000000}T{color}| | |
|{color:#000000}*CategoryMapper*{color}| 
|{color:#000000}-{color}|{color:#000000}not in onnx 
document{color}|{color:#000000}T{color}| | |

*For details, these 19 operators belong to these three models separately:*

*Bidirectional Attention Flow:*
 ArgMax
 Cast
 CategoryMapper
 Ceil
 Compress
 ConstantOfShape
 Dropout
 Gather
 Hardmax
 ReduceMax
 ReduceSum
 Scan
 Shape
 Slice
 Transpose

*BERT-Squad:*
 Slice
 Shape
 Gather
 ReduceMean
 Cast
 Tile
 Transpose
 Split

*GPT-2:*
 ConstantOfShape
 Slice
 Shape
 Gather
 ReduceMean
 NonZero
 Cast
 Transpose
 Split

 


> add autograd operators for NLP models
> -------------------------------------
>
>                 Key: SINGA-506
>                 URL: https://issues.apache.org/jira/browse/SINGA-506
>             Project: Singa
>          Issue Type: New Feature
>            Reporter: zhangzhaoqi
>            Priority: Major
>
> *We are going to support these three NLP models, called, Bidirectional 
> Attention Flow, BERT-Squad and GPT-2.*
> *Totally, there are still 19 operators that we need to add as following,*
>  
> *For details, these 19 operators belong to these three models separately:*
> |{color:#000000}*Operator*{color}|{color:#000000}*Rank*{color}|{color:#000000}*Workload*{color}|{color:#000000}*Comments*{color}|{color:#000000}*Bidirectional
>  Attention 
> Flow*{color}|{color:#000000}*BERT-Squad*{color}|{color:#000000}*GPT-2*{color}|
> |{color:#000000}*Transpose*{color}|{color:#000000}easy{color}|{color:#000000}1h{color}|{color:#000000}Transpose
>  the input tensor similar to numpy.transpose. 
> {color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
> |{color:#000000}*ConstantOfShape*{color}|{color:#000000}easy{color}|{color:#000000}2h{color}|{color:#000000}Generate
>  a tensor with given value and shape.{color}|{color:#000000}T{color}| 
> |{color:#000000}T{color}|
> |{color:#000000}*Shape*{color}|{color:#000000}easy{color}|{color:#000000}2h{color}|{color:#000000}Takes
>  a tensor as input and outputs an 1D int64 tensor containing the shape of the 
> input 
> tensor.{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
> |{color:#000000}*Dropout*{color}|{color:#000000}easy{color}|{color:#000000}3h{color}|{color:#000000}Dropout
>  takes an input floating-point tensor and an input ratio (floating-point 
> scalar), and produces two tensor outputs, output (floating-point tensor) and 
> mask (Tensor<bool>). {color}|{color:#000000}T{color}| | |
> |{color:#000000}*Ceil*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}y
>  = ceil(x){color}|{color:#000000}T{color}| | |
> |{color:#000000}*ReduceMax*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes
>  the max of the input tensor's element along the provided axes. 
> {color}|{color:#000000}T{color}| | |
> |{color:#000000}*ReduceMean*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes
>  the mean of the input tensor's element along the provided axes.{color}| 
> |{color:#000000}T{color}|{color:#000000}T{color}|
> |{color:#000000}*ReduceSum*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes
>  the sum of the input tensor's element along the provided 
> axes.{color}|{color:#000000}T{color}| | |
> |{color:#000000}*Slice*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Produces
>  a slice of the input tensor along multiple axes. 
> {color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
> |{color:#000000}*Compress*{color}|{color:#000000}easy{color}|{color:#000000}6h{color}|{color:#000000}Selects
>  slices from an input tensor along a given axis where condition evaluates to 
> True for each axis index.{color}|{color:#000000}T{color}| | |
> |{color:#000000}*Hardmax*{color}|{color:#000000}easy{color}|{color:#000000}6h{color}|{color:#000000}The
>  operator computes the hardmax (1 for the first maximum value, and 0 for all 
> others) values for each layer in the batch of the given 
> input.{color}|{color:#000000}T{color}| | |
> |{color:#000000}*NonZero*{color}|{color:#000000}easy{color}|{color:#000000}12h{color}|{color:#000000}Returns
>  the indices of the elements that are non-zero (in row-major order - by 
> dimension).{color}| | |{color:#000000}T{color}|
> |{color:#000000}*Split*{color}|{color:#000000}easy{color}|{color:#000000}12h{color}|{color:#000000}Split
>  a tensor into a list of tensors, along the specified 'axis'.{color}| 
> |{color:#000000}T{color}|{color:#000000}T{color}|
> |{color:#000000}*Tile*{color}|{color:#000000}easy{color}|{color:#000000}1d{color}|{color:#000000}Constructs
>  a tensor by tiling a given tensor. This is the same as function tile in 
> Numpy, but no broadcast. For example A = [[1, 2], [3, 4]], B = [1, 2], 
> tile(A, B) = [[1, 2, 1, 2], [3, 4, 3, 4]]{color}| |{color:#000000}T{color}| |
> |{color:#000000}*ArgMax*{color}|{color:#000000}complicated{color}|{color:#000000}2d{color}|{color:#000000}Computes
>  the indices of the max elements of the input tensor's element along the 
> provided axis. {color}|{color:#000000}T{color}| | |
> |{color:#000000}*Gather*{color}|{color:#000000}complicated{color}|{color:#000000}3d{color}|{color:#000000}Given
>  data tensor of rank r >= 1, and indices tensor of rank q, gather entries of 
> the axis dimension of data (by default outer-most one as axis=0) indexed by 
> indices, and concatenates 
> them{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
> |{color:#000000}*Scan*{color}|{color:#000000}hard{color}|{color:#000000}2w{color}|{color:#000000}Scan
>  can be used to iterate over one or more scan_input tensors, constructing 
> zero or more scan_output tensors. It combines ideas from general recurrences, 
> functional programming constructs such as scan, fold, map, and zip and is 
> intended to enable generalizations of RNN-like constructs for 
> sequence-to-sequence processing.{color}|{color:#000000}T{color}| | |
> |{color:#000000}*Cast*{color}|{color:#000000}hard{color}|{color:#000000}-{color}|{color:#000000}The
>  operator casts the elements of a given input tensor to a data type specified 
> by the 'to' argument and returns an output tensor of the same size in the 
> converted 
> type.{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
> |{color:#000000}*CategoryMapper*{color}| 
> |{color:#000000}-{color}|{color:#000000}not in onnx 
> document{color}|{color:#000000}T{color}| | |
> *Bidirectional Attention Flow:*
>  ArgMax
>  Cast
>  CategoryMapper
>  Ceil
>  Compress
>  ConstantOfShape
>  Dropout
>  Gather
>  Hardmax
>  ReduceMax
>  ReduceSum
>  Scan
>  Shape
>  Slice
>  Transpose
> *BERT-Squad:*
>  Slice
>  Shape
>  Gather
>  ReduceMean
>  Cast
>  Tile
>  Transpose
>  Split
> *GPT-2:*
>  ConstantOfShape
>  Slice
>  Shape
>  Gather
>  ReduceMean
>  NonZero
>  Cast
>  Transpose
>  Split
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (SINGA-506) add autograd operators for NLP models

Reply via email to