comaniac commented on issue #7135: URL: https://github.com/apache/tvm/issues/7135#issuecomment-748537304
OK so the root cause is your script uses `opt_level=0` when building the model; while auto_scheduler task extraction uses `opt_level=3`. I changed `opt_level` in `search_dense_gpu.py` to 3 and here is what I got: ``` Compile... ----------------------------------- Cannot find tuned schedules for target=metal -keys=metal,gpu -max_num_threads=256, workload_key=["13da82b16db5a9fde8953f4c5667d2e4"]. A fallback TOPI schedule is used, which may bring great performance regression or even compilation failure. Compute DAG info: placeholder = PLACEHOLDER [1, 768] placeholder = PLACEHOLDER [768, 768] T_dense(i, j) += (placeholder[i, k]*placeholder[j, k]) placeholder = PLACEHOLDER [768] T_add(ax0, ax1) = (T_dense[ax0, ax1] + placeholder[ax1]) T_minimum(ax0, ax1) = min(T_add[ax0, ax1], 9f) T_maximum(ax0, ax1) = max(T_minimum[ax0, ax1], -9f) T_fast_tanh(ax0, ax1) = ((T_maximum[ax0, ax1]*(((T_maximum[ax0, ax1]*T_maximum[ax0, ax1])*(((T_maximum[ax0, ax1]*T_maximum[ax0, ..(OMITTED).. *T_maximum[ax0, ax1])*(((T_maximum[ax0, ax1]*T_maximum[ax0, ax1])*1.19826e-06f) + 0.000118535f)) + 0.00226843f)) + 0.00489353f)) ``` The task hash code `13da82b16db5a9fde8953f4c5667d2e4` matches one of the extracted tasks from the model: ``` ========== Task 9 (workload key: ["13da82b16db5a9fde8953f4c5667d2e4"]) ========== placeholder = PLACEHOLDER [1, 768] placeholder = PLACEHOLDER [768, 768] T_dense(i, j) += (placeholder[i, k]*placeholder[j, k]) placeholder = PLACEHOLDER [768] T_add(ax0, ax1) = (T_dense[ax0, ax1] + placeholder[ax1]) T_minimum(ax0, ax1) = min(T_add[ax0, ax1], 9f) T_maximum(ax0, ax1) = max(T_minimum[ax0, ax1], -9f) T_fast_tanh(ax0, ax1) = ((T_maximum[ax0, ax1]*(((T_maximum[ax0, ax1]*T_maximum[ax0, ax1])*(((T_maximum[ax0, ax1]*T_maximum[ax0, ..(OMITTED).. *T_maximum[ax0, ax1])*(((T_maximum[ax0, ax1]*T_maximum[ax0, ax1])*1.19826e-06f) + 0.000118535f)) + 0.00226843f)) + 0.00489353f)) ``` In conclusion, this is not really a bug, but we may need to come up with a solution to further improve task extraction configuration. I'm closing this issue first, and we could have an RFC on the discuss forum. cc @merrymercy ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
