[ https://issues.apache.org/jira/browse/SINGA-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhangzhaoqi updated SINGA-506: ------------------------------ Description: *We are going to support these two NLP models, called, BERT-Squad and GPT-2.* *Totally, there are still 13 operators that we need to add as following,* *For details, these 13 operators belong to these three models separately:* |{color:#000000}*Operator*{color}|{color:#000000}*Rank*{color}|{color:#000000}*Workload*{color}|{color:#000000}*Comments*{color}|{color:#000000}*BERT-Squad*{color}|{color:#000000}*GPT-2*{color}| |-{color:#000000}*Transpose*{color}-|{color:#000000}easy{color}|{color:#000000}1h{color}|{color:#000000}Transpose the input tensor similar to numpy.transpose. {color}|{color:#000000}T{color}|{color:#000000}T{color}| |-{color:#000000}*ConstantOfShape*{color}-|{color:#000000}easy{color}|{color:#000000}2h{color}|{color:#000000}Generate a tensor with given value and shape.{color}| |{color:#000000}T{color}| |-{color:#000000}*Shape*{color}-|{color:#000000}easy{color}|{color:#000000}2h{color}|{color:#000000}Takes a tensor as input and outputs an 1D int64 tensor containing the shape of the input tensor.{color}|{color:#000000}T{color}|{color:#000000}T{color}| |-{color:#000000}*Dropout*{color}-|{color:#000000}easy{color}|{color:#000000}3h{color}|{color:#000000}Dropout takes an input floating-point tensor and an input ratio (floating-point scalar), and produces two tensor outputs, output (floating-point tensor) and mask (Tensor<bool>). {color}| | | |-{color:#000000}*Ceil*{color}-|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}y = ceil(x){color}| | | |-{color:#000000}*ReduceMean*{color}-|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes the mean of the input tensor's element along the provided axes.{color}|{color:#000000}T{color}|{color:#000000}T{color}| |-{color:#000000}*ReduceSum*{color}-|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes the sum of the input tensor's element along the provided axes.{color}| | | |-{color:#000000}*Slice*{color}-|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Produces a slice of the input tensor along multiple axes. {color}|{color:#000000}T{color}|{color:#000000}T{color}| |-{color:#000000}*NonZero*{color}-|{color:#000000}easy{color}|{color:#000000}12h{color}|{color:#000000}Returns the indices of the elements that are non-zero (in row-major order - by dimension).{color}| |{color:#000000}T{color}| |-{color:#000000}*Split*{color}-|{color:#000000}easy{color}|{color:#000000}12h{color}|{color:#000000}Split a tensor into a list of tensors, along the specified 'axis'.{color}|{color:#000000}T{color}|{color:#000000}T{color}| |-{color:#000000}*Tile*{color}-|{color:#000000}easy{color}|{color:#000000}1d{color}|{color:#000000}Constructs a tensor by tiling a given tensor. This is the same as function tile in Numpy, but no broadcast. For example A = [[1, 2], [3, 4]], B = [1, 2], tile(A, B) = [[1, 2, 1, 2], [3, 4, 3, 4]]{color}|{color:#000000}T{color}| | |-{color:#000000}*Gather*{color}-|{color:#000000}complicated{color}|{color:#000000}3d{color}|{color:#000000}Given data tensor of rank r >= 1, and indices tensor of rank q, gather entries of the axis dimension of data (by default outer-most one as axis=0) indexed by indices, and concatenates them{color}|{color:#000000}T{color}|{color:#000000}T{color}| |-{color:#000000}*Cast*{color}-|{color:#000000}hard{color}|{color:#000000}-{color}|{color:#000000}The operator casts the elements of a given input tensor to a data type specified by the 'to' argument and returns an output tensor of the same size in the converted type.{color}|{color:#000000}T{color}|{color:#000000}T{color}| *BERT-Squad:* Slice Shape Gather ReduceMean Cast Tile Transpose Split *GPT-2:* ConstantOfShape Slice Shape Gather ReduceMean NonZero Cast Transpose Split was: *We are going to support these two NLP models, called, BERT-Squad and GPT-2.* *Totally, there are still 13 operators that we need to add as following,* *For details, these 13 operators belong to these three models separately:* |{color:#000000}*Operator*{color}|{color:#000000}*Rank*{color}|{color:#000000}*Workload*{color}|{color:#000000}*Comments*{color}|{color:#000000}*BERT-Squad*{color}|{color:#000000}*GPT-2*{color}| |-{color:#000000}*Transpose*{color}-|{color:#000000}easy{color}|{color:#000000}1h{color}|{color:#000000}Transpose the input tensor similar to numpy.transpose. {color}|{color:#000000}T{color}|{color:#000000}T{color}| |-{color:#000000}*ConstantOfShape*{color}-|{color:#000000}easy{color}|{color:#000000}2h{color}|{color:#000000}Generate a tensor with given value and shape.{color}| |{color:#000000}T{color}| |-{color:#000000}*Shape*{color}-|{color:#000000}easy{color}|{color:#000000}2h{color}|{color:#000000}Takes a tensor as input and outputs an 1D int64 tensor containing the shape of the input tensor.{color}|{color:#000000}T{color}|{color:#000000}T{color}| |-{color:#000000}*Dropout*{color}-|{color:#000000}easy{color}|{color:#000000}3h{color}|{color:#000000}Dropout takes an input floating-point tensor and an input ratio (floating-point scalar), and produces two tensor outputs, output (floating-point tensor) and mask (Tensor<bool>). {color}| | | |-{color:#000000}*Ceil*{color}-|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}y = ceil(x){color}| | | |-{color:#000000}*ReduceMean*{color}-|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes the mean of the input tensor's element along the provided axes.{color}|{color:#000000}T{color}|{color:#000000}T{color}| |-{color:#000000}*ReduceSum*{color}-|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes the sum of the input tensor's element along the provided axes.{color}| | | |-{color:#000000}*Slice*{color}-|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Produces a slice of the input tensor along multiple axes. {color}|{color:#000000}T{color}|{color:#000000}T{color}| |-{color:#000000}*NonZero*{color}-|{color:#000000}easy{color}|{color:#000000}12h{color}|{color:#000000}Returns the indices of the elements that are non-zero (in row-major order - by dimension).{color}| |{color:#000000}T{color}| |-{color:#000000}*Split*{color}-|{color:#000000}easy{color}|{color:#000000}12h{color}|{color:#000000}Split a tensor into a list of tensors, along the specified 'axis'.{color}|{color:#000000}T{color}|{color:#000000}T{color}| |-{color:#000000}*Tile*{color}-|{color:#000000}easy{color}|{color:#000000}1d{color}|{color:#000000}Constructs a tensor by tiling a given tensor. This is the same as function tile in Numpy, but no broadcast. For example A = [[1, 2], [3, 4]], B = [1, 2], tile(A, B) = [[1, 2, 1, 2], [3, 4, 3, 4]]{color}|{color:#000000}T{color}| | |-{color:#000000}*Gather*{color}-|{color:#000000}complicated{color}|{color:#000000}3d{color}|{color:#000000}Given data tensor of rank r >= 1, and indices tensor of rank q, gather entries of the axis dimension of data (by default outer-most one as axis=0) indexed by indices, and concatenates them{color}|{color:#000000}T{color}|{color:#000000}T{color}| |{color:#000000}*Cast*{color}|{color:#000000}hard{color}|{color:#000000}-{color}|{color:#000000}The operator casts the elements of a given input tensor to a data type specified by the 'to' argument and returns an output tensor of the same size in the converted type.{color}|{color:#000000}T{color}|{color:#000000}T{color}| *BERT-Squad:* Slice Shape Gather ReduceMean Cast Tile Transpose Split *GPT-2:* ConstantOfShape Slice Shape Gather ReduceMean NonZero Cast Transpose Split > add autograd operators for NLP models > ------------------------------------- > > Key: SINGA-506 > URL: https://issues.apache.org/jira/browse/SINGA-506 > Project: Singa > Issue Type: New Feature > Reporter: zhangzhaoqi > Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > *We are going to support these two NLP models, called, BERT-Squad and GPT-2.* > *Totally, there are still 13 operators that we need to add as following,* > > *For details, these 13 operators belong to these three models separately:* > |{color:#000000}*Operator*{color}|{color:#000000}*Rank*{color}|{color:#000000}*Workload*{color}|{color:#000000}*Comments*{color}|{color:#000000}*BERT-Squad*{color}|{color:#000000}*GPT-2*{color}| > |-{color:#000000}*Transpose*{color}-|{color:#000000}easy{color}|{color:#000000}1h{color}|{color:#000000}Transpose > the input tensor similar to numpy.transpose. > {color}|{color:#000000}T{color}|{color:#000000}T{color}| > |-{color:#000000}*ConstantOfShape*{color}-|{color:#000000}easy{color}|{color:#000000}2h{color}|{color:#000000}Generate > a tensor with given value and shape.{color}| |{color:#000000}T{color}| > |-{color:#000000}*Shape*{color}-|{color:#000000}easy{color}|{color:#000000}2h{color}|{color:#000000}Takes > a tensor as input and outputs an 1D int64 tensor containing the shape of the > input tensor.{color}|{color:#000000}T{color}|{color:#000000}T{color}| > |-{color:#000000}*Dropout*{color}-|{color:#000000}easy{color}|{color:#000000}3h{color}|{color:#000000}Dropout > takes an input floating-point tensor and an input ratio (floating-point > scalar), and produces two tensor outputs, output (floating-point tensor) and > mask (Tensor<bool>). {color}| | | > |-{color:#000000}*Ceil*{color}-|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}y > = ceil(x){color}| | | > |-{color:#000000}*ReduceMean*{color}-|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes > the mean of the input tensor's element along the provided > axes.{color}|{color:#000000}T{color}|{color:#000000}T{color}| > |-{color:#000000}*ReduceSum*{color}-|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes > the sum of the input tensor's element along the provided axes.{color}| | | > |-{color:#000000}*Slice*{color}-|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Produces > a slice of the input tensor along multiple axes. > {color}|{color:#000000}T{color}|{color:#000000}T{color}| > |-{color:#000000}*NonZero*{color}-|{color:#000000}easy{color}|{color:#000000}12h{color}|{color:#000000}Returns > the indices of the elements that are non-zero (in row-major order - by > dimension).{color}| |{color:#000000}T{color}| > |-{color:#000000}*Split*{color}-|{color:#000000}easy{color}|{color:#000000}12h{color}|{color:#000000}Split > a tensor into a list of tensors, along the specified > 'axis'.{color}|{color:#000000}T{color}|{color:#000000}T{color}| > |-{color:#000000}*Tile*{color}-|{color:#000000}easy{color}|{color:#000000}1d{color}|{color:#000000}Constructs > a tensor by tiling a given tensor. This is the same as function tile in > Numpy, but no broadcast. For example A = [[1, 2], [3, 4]], B = [1, 2], > tile(A, B) = [[1, 2, 1, 2], [3, 4, 3, 4]]{color}|{color:#000000}T{color}| | > |-{color:#000000}*Gather*{color}-|{color:#000000}complicated{color}|{color:#000000}3d{color}|{color:#000000}Given > data tensor of rank r >= 1, and indices tensor of rank q, gather entries of > the axis dimension of data (by default outer-most one as axis=0) indexed by > indices, and concatenates > them{color}|{color:#000000}T{color}|{color:#000000}T{color}| > |-{color:#000000}*Cast*{color}-|{color:#000000}hard{color}|{color:#000000}-{color}|{color:#000000}The > operator casts the elements of a given input tensor to a data type specified > by the 'to' argument and returns an output tensor of the same size in the > converted type.{color}|{color:#000000}T{color}|{color:#000000}T{color}| > *BERT-Squad:* > Slice > Shape > Gather > ReduceMean > Cast > Tile > Transpose > Split > *GPT-2:* > ConstantOfShape > Slice > Shape > Gather > ReduceMean > NonZero > Cast > Transpose > Split > -- This message was sent by Atlassian Jira (v8.3.4#803005)