I have had the same experience that Patric describes, having tried to use a model that had operators with hardware-specific (cudnn_off in my case) attributes and unable to use the model more generally. However, I also appreciate what Dick is proposing and I too see a need for hardware specific “customization” of operators that is scalable.
Instead, I propose that we have a way of providing hardware-specific operator attributes that is orthogonal to the MXNet (NNVM) operator registration abstraction. When we need to change an operator for a specific hardware/processor/accelerator we shouldn’t need to modify MXNet’s source code. One possibility is to use the Accelerator API proposal: https://cwiki.apache.org/confluence/display/MXNET/Bring+your+own+Accelerator to treat each processor (CPU, GPU:CUDA, GPU:CUDNN, GPU:TensorRT, etc) as a separate hardware backend, to clearly demarcate what is a “pure” operator definition and what are hardware-specific attributes for operators. Sam On Jun 4, 2019, at 7:29 PM, Zhao, Patric <[email protected]<mailto:[email protected]>> wrote: Thanks for the new proposal. My concern for the current proposal is that the script/code will be NOT portable and backward compatible, also increase the complexity for the usage, with such backend specific info in the operator. Let's say if the user set the backend parameter in their script , such as conv algo=Winograd, precision=fp16, layout=NHWC, etc. This group of the parameter can get the best performance in the tested HW but maybe cause performance degradation and even can't be executed in different HW. One example is from the Github issue where the `layout` parameter caused the error, https://github.com/apache/incubator-mxnet/issues/15079 Thus, I think we need to remove this kind of context-specific operator parameters, like 'cudnn_tune', 'cudnn_off`, `layout`, rather than adding more parameters into operator. I suggest hiding this kind of optimization and selection to backend, maybe using subgraph. Thanks, --Patric -----Original Message----- From: Dick Carter [mailto:[email protected]] Sent: Tuesday, June 4, 2019 8:21 AM To: [email protected]<mailto:[email protected]> Subject: Context-specific operator parameters MXNet has a number of context-specific operator parameters: 'cudnn_tune', 'cudnn_off' and 'workspace' are parameters that control the behavior of Convolution on gpu contexts with NVIDIA gpus. Even with these, there would be benefits to having additional parameters, e.g. to set Convolution algos by number, or force the compute precision to float16. With the desire to support multiple backends and a growing number of operators, it's time to ask the question, "Is this scalable?" I propose that, rather than adding a new parameter at the Python level for each new backend-specific parameter 'knob', all context-specific parameters be swept into a single dictionary, called e.g. 'ctx_params': Convolution(..., ctx_params= {'cudnn_tune': 2, 'cudnn_off': False, 'workspace': 2000}, ...) I'll stop short of working out all the details to hopefully generate more discussion. Some open questions: Do all backends share the same namespace, or do we have separate 'gpu_ctx_params', 'cpu_ctx_params', etc.? Is there a clean extension to the general parameter parsing facility of dmlc to handle this dictionary, and what form do these extension params take in the backend, Map<string,string>? And while this proposes to organize and consolidate these context-specific parameters at the Python level, we'd need to tolerate (and auto-create) documentation for these new parameters. Other approaches welcome.
