[ https://issues.apache.org/jira/browse/SINGA-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhangzhaoqi updated SINGA-506: ------------------------------ Description: *We are going to support these three NLP models, called, Bidirectional Attention Flow, BERT-Squad and GPT-2.* *Totally, there are still 19 operators that we need to add as following,* Transpose, easy, 0.5 days ConstantOfShape, easy, 0.5 days ReduceMax, easy, 0.5 days ReduceMean, easy, 0.5 days ReduceSum, easy, 0.5 days Shape, easy, 0.5 days Slice, easy, 0.5 days Dropout, easy, 0.5 days Hardmax, easy, 1 days NonZero, easy, 1 days Split, easy, 1 days Tile, easy, 1 days Ceil, easy, 1 days Compress, easy, 1 days Gather, complicated, 2-3 days, c++ Cast, hard, change data type, maybe cannot do CategoryMapper, not in onnx document(Only for Bidirectional Attention Flow) ArgMax, complicated, 2-3 days, c++(Only for Bidirectional Attention Flow) Scan, hard, functional programming constructs, 1-2 weeks(Only for Bidirectional Attention Flow) *For details, these 19 operators belong to these three models separately:* *Bidirectional Attention Flow:* ArgMax Cast CategoryMapper Ceil Compress ConstantOfShape Dropout Gather Hardmax ReduceMax ReduceSum Scan Shape Slice Transpose *BERT-Squad:* Slice Shape Gather ReduceMean Cast Tile Transpose Split *GPT-2:* ConstantOfShape Slice Shape Gather ReduceMean NonZero Cast Transpose Split was: *We are going to support these three NLP models, called, Bidirectional Attention Flow, BERT-Squad and GPT-2.* *Totally, there are still 21 operators that we need to add as following,* ArgMax Cast CategoryMapper Ceil Compress ConstantOfShape Dropout Gather Hardmax Identity NonZero ReduceMax ReduceMean ReduceSum Scan Shape Slice Split Squeeze Tile Transpose *For details, these 21 operators belong to these three models separately:* *Bidirectional Attention Flow:* ArgMax Cast CategoryMapper Ceil Compress ConstantOfShape Dropout Gather Hardmax ReduceMax ReduceSum Scan Shape Slice Squeeze Transpose *BERT-Squad:* Slice Squeeze Shape Identity Gather ReduceMean Cast Tile Transpose Split *GPT-2:* ConstantOfShape Slice Shape Gather ReduceMean NonZero Cast Transpose Split > add autograd operators for NLP models > ------------------------------------- > > Key: SINGA-506 > URL: https://issues.apache.org/jira/browse/SINGA-506 > Project: Singa > Issue Type: New Feature > Reporter: zhangzhaoqi > Priority: Major > > *We are going to support these three NLP models, called, Bidirectional > Attention Flow, BERT-Squad and GPT-2.* > *Totally, there are still 19 operators that we need to add as following,* > Transpose, easy, 0.5 days > ConstantOfShape, easy, 0.5 days > ReduceMax, easy, 0.5 days > ReduceMean, easy, 0.5 days > ReduceSum, easy, 0.5 days > Shape, easy, 0.5 days > Slice, easy, 0.5 days > Dropout, easy, 0.5 days > Hardmax, easy, 1 days > NonZero, easy, 1 days > Split, easy, 1 days > Tile, easy, 1 days > Ceil, easy, 1 days > Compress, easy, 1 days > Gather, complicated, 2-3 days, c++ > Cast, hard, change data type, maybe cannot do > CategoryMapper, not in onnx document(Only for Bidirectional Attention Flow) > ArgMax, complicated, 2-3 days, c++(Only for Bidirectional Attention Flow) > Scan, hard, functional programming constructs, 1-2 weeks(Only for > Bidirectional Attention Flow) > > *For details, these 19 operators belong to these three models separately:* > *Bidirectional Attention Flow:* > ArgMax > Cast > CategoryMapper > Ceil > Compress > ConstantOfShape > Dropout > Gather > Hardmax > ReduceMax > ReduceSum > Scan > Shape > Slice > Transpose > *BERT-Squad:* > Slice > Shape > Gather > ReduceMean > Cast > Tile > Transpose > Split > *GPT-2:* > ConstantOfShape > Slice > Shape > Gather > ReduceMean > NonZero > Cast > Transpose > Split > -- This message was sent by Atlassian Jira (v8.3.4#803005)