MasterJH5574 opened a new pull request, #15192: URL: https://github.com/apache/tvm/pull/15192
This PR enhances the LowerCrossThreadReduction pass with the thread-broadcasting block rewrite. Specifically, previously whenever a TIR block has thread-broadcast behavior (i.e., there exists some thread var which is free for the block), we never insert a predicate for the block and therefore the generated final code has race condition, which sometimes lead to wrong computation results. This PR enhances the pass by collecting thread var information along transformation, and rewrite the thread-broadcast TIR block with additional predicate clauses which bound the thread vars and effectively state that "only execute the block when `thread_var == 0`". Therefore, the race condition issue in such blocks is resolved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
