DickJC123 commented on a change in pull request #7347: Tensorcore conv deconv 
URL: https://github.com/apache/incubator-mxnet/pull/7347#discussion_r132093608

 File path: src/operator/cudnn_algoreg-inl.h
 @@ -40,12 +65,17 @@ class CuDNNAlgoReg {
     oss << "cudnn_data_type=" << cudnn_data_type << ";";
     oss << "cudnn_forward_compute_type=" << cudnn_forward_compute_type << ";";
     oss << "cudnn_backward_compute_type=" << cudnn_backward_compute_type << 
+    // A system could be heterogeneous and thus have different algo choices 
for different
+    // device ids.  'device_id' could possibly be replaced with gpu compute 
+    // but identical GPUs could technically have different clock settings.
+    oss << "device_id=" << device_id << ";";
 Review comment:
   I'll update the PR tomorrow by substituting compute capability.  This will 
ensure proper operation for a workstation with both a PASCAL and a VOLTA brick, 
yet will improve the algo selection speed for an 8-way homogeneous system.  
Regarding key compaction, I'll point out that all key look-ups are performed 
during the graph construction phase, never during inference or training.  Also, 
the key look-up times are probably dwarfed by any Find() calls that are run 
during this same phase.  Not sure how the times compare to the Get() calls 
performed when auto-tuning it turned off.
