masahi commented on PR #80:
URL: https://github.com/apache/tvm-rfcs/pull/80#issuecomment-1162848166

   Thanks for the detailed feedback @JosephTheOctonaut! I'll update the doc 
accordingly, but here are my answers.
   
   > I don't think I saw it explicitly mentioned, but it is assumed/required 
that the backend executes committed chunks in order, right
   
   Correct, commit groups must execute in FIFO order. But the order of 
completions within one commit group are not specified, following the PTX spec. 
   
   > The index "software_pipeline_async_stages" refers to elements of the 
"software_pipeline_stage" list, correct?
   
   Yes, I haven't put deep thought into the name choice, here I simply want to 
say "the index of stmt/block in the list", provided to 
`software_pipeline_stage` and `software_pipeline_order`. @vinx13 was also 
confused, so I should come with a better name. Maybe 
`software_pipeline_async_statement_index`?
   
   > The definition of async_commit_stage(i) is to "Group one or more 
invocation of async operations...". Is the model here that all uncommitted 
async operations at that point in the program are committed? Furthermore, are 
"async operations" defined as any statement in a pipeline stage that was 
identified as async? 
   
   Both correct. I'm using "commit" in the same sense as PTX here. In the doc, 
I'm probably using "async operations" when I should be using "async commit 
groups" to be exact. But I think I'm using "async commit groups" when the 
distinction matters. 
   
   > I initially found it confusing that async_commit_stage(i) could appear 
inside and outside of an async_scope: section
   
   Yes, I agree that it was confusing, I was a bit informal since it is just 
pseudocode. As you said, the exact placement doesn't matter, both in the 
illustration and the implementation. I made it consistent in the doc. In the 
implementation, `async_scope` only encloses the operation itself. 
   
   > Is it correct that async_wait_stage will only block the "main" thread of 
execution?
   
   This is an interesting question, that I haven't really thought about. I 
would expect that each async "engine" is represented by its own thread, so for 
example if a vector unit finds out that it needs to wait at some point, the 
thread that's running the vector unit should block. I hope this makes sense... 
I think this is a natural model but as you said, I'm not sure if such details 
should be specified at the TIR level. 
   
   > It's interesting that you mention that there's no easy translation from 
tokens to counting (based on MLIR not having implemented one?), but you suspect 
the reverse could be simple. Does this suggest that the token system has less 
information encoded than the counting system? (I.e., we can go counting --> 
tokens but not the reverse because we lost information in the transformation.) 
Or is it just specifics of a PTX-like system, not a "counting system" in 
general, that make the translation to it hard?
   
   What I said is that the reverse seems more "feasible", not simple :) I would 
say, the counting system relies on more implicit states in the HW, so going 
from token to counts requires uncovering such states from the given token 
alone. The lost information would the notion of an ordering of async operations 
(or commit groups, to be exact). Given only a token, we don’t know (1) how many 
other async-ops are in flight before that sync point and (2) how many of them 
can still be in-flight after (The latter one is required by PTX). I claim that 
this is a difficult problem in general, and I gave the MLIR bit as a data 
point. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to