[GitHub] [tvm] zhuwenxi edited a comment on issue #7246: [BUG][Tensorize] race condition when using "tvm.tir.call_packed()" in a parallel schedule.

GitBox Thu, 14 Jan 2021 19:40:09 -0800


zhuwenxi edited a comment on issue #7246:
URL: https://github.com/apache/tvm/issues/7246#issuecomment-760623570



   @tqchen 
   Just curious, as far as I know nested parallel loops are not allowed in CPU 
backend: 
https://github.com/apache/tvm/blob/main/src/target/llvm/codegen_cpu.cc#L994, so 
I suppose you mean other backends such GPU?
   
   Thread-local stack does make sense. Is it true that the `packed_arg_alloca` 
tir will only be generated, when current function is in a "parallel" for loop? 
   
   If so, there will be no performance issue. Otherwise, there could be a 
performance degradation in a pure single-thread schedule (no "parallel()" at 
all), because there will be multiple thread-local stacks, while they could have 
shared a single global stack in the first place.    


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] zhuwenxi edited a comment on issue #7246: [BUG][Tensorize] race condition when using "tvm.tir.call_packed()" in a parallel schedule.

Reply via email to