masahi commented on code in PR #15656:
URL: https://github.com/apache/tvm/pull/15656#discussion_r1315572243


##########
python/tvm/tir/op.py:
##########
@@ -1458,16 +1512,42 @@ def ptx_arrive_barrier(barrier_arr, barrier_id):
     return call_intrin("", "tir.ptx_arrive_barrier", barrier_arr, barrier_id)
 
 
+def ptx_arrive_barrier_expect_tx(barrier_arr, barrier_id, byte_count):
+    """TVM intrinsic for ptx barrier arrival with expect tx using 
mbarrier.arrive.expect_tx
+    
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-arrive
+    
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-expect-tx-operation
+
+    Parameters
+    ----------
+    barrier_arr : string
+        The name of the barrier array in shared memory.
+
+    barrier_id : int
+        Index into the barrier array.
+
+    byte_count : int
+        Increases the the tx count of the mbarrier object to track completion 
of

Review Comment:
   typo `the the`



##########
include/tvm/tir/builtin.h:
##########
@@ -645,14 +645,29 @@ TVM_DLL const Op& ptx_mma_sp();
 TVM_DLL const Op& ptx_ldmatrix();
 
 /*!
- * \brief tvm intrinsics for ptx async copy from global to shared memory
- *
- * void ptx_cp_async(Var shared_ptr, Expr shared_offset, Var global_ptr, Expr 
global_offset, size_t
- * bytes);
+ * \brief tvm intrinsics for ptx async copy from global to shared memory using 
cp.async
  *
+ * void ptx_cp_async(Var shared_ptr,
+ *                   Expr shared_offset,
+ *                   Var global_ptr,
+ *                   Expr global_offset,
+ *                   size_t bytes);
  */
 TVM_DLL const Op& ptx_cp_async();
 
+/*!
+ * \brief tvm intrinsics for ptx async copy from global to shared memory using 
cp.async.bulk
+ *
+ * void ptx_cp_async(Var shared_ptr,
+ *                   Expr shared_offset,
+ *                   Var global_ptr,
+ *                   Expr global_offset,
+ *                   size_t bytes,
+ *                   string barrier_arr,
+ *                   int barrier_id);
+ */
+TVM_DLL const Op& ptx_cp_async_bulk();

Review Comment:
   I just realized that a barrier is specified by a string / name in the intrin 
signature rather than a pointer variable. Is there a reason a barrier cannot be 
a `Var`, like src / dst pointers of the `cp.async` intrin? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to