masahi commented on PR #80: URL: https://github.com/apache/tvm-rfcs/pull/80#issuecomment-1162848166
Thanks for the detailed feedback @JosephTheOctonaut! I'll update the doc accordingly, but here are my answers. > I don't think I saw it explicitly mentioned, but it is assumed/required that the backend executes committed chunks in order, right Correct, commit groups must execute in FIFO order. But the order of completions within one commit group are not specified, following the PTX spec. > The index "software_pipeline_async_stages" refers to elements of the "software_pipeline_stage" list, correct? Yes, I haven't put deep thought into the name choice, here I simply want to say "the index of stmt/block in the list", provided to `software_pipeline_stage` and `software_pipeline_order`. @vinx13 was also confused, so I should come with a better name. Maybe `software_pipeline_async_statement_index`? > The definition of async_commit_stage(i) is to "Group one or more invocation of async operations...". Is the model here that all uncommitted async operations at that point in the program are committed? Furthermore, are "async operations" defined as any statement in a pipeline stage that was identified as async? Both correct. I'm using "commit" in the same sense as PTX here. In the doc, I'm probably using "async operations" when I should be using "async commit groups" to be exact. But I think I'm using "async commit groups" when the distinction matters. > I initially found it confusing that async_commit_stage(i) could appear inside and outside of an async_scope: section Yes, I agree that it was confusing, I was a bit informal since it is just pseudocode. As you said, the exact placement doesn't matter, both in the illustration and the implementation. I made it consistent in the doc. In the implementation, `async_scope` only encloses the operation itself. > Is it correct that async_wait_stage will only block the "main" thread of execution? This is an interesting question, that I haven't really thought about. I would expect that each async "engine" is represented by its own thread, so for example if a vector unit finds out that it needs to wait at some point, the thread that's running the vector unit should block. I hope this makes sense... I think this is a natural model but as you said, I'm not sure if such details should be specified at the TIR level. > It's interesting that you mention that there's no easy translation from tokens to counting (based on MLIR not having implemented one?), but you suspect the reverse could be simple. Does this suggest that the token system has less information encoded than the counting system? (I.e., we can go counting --> tokens but not the reverse because we lost information in the transformation.) Or is it just specifics of a PTX-like system, not a "counting system" in general, that make the translation to it hard? What I said is that the reverse seems more "feasible", not simple :) I would say, the counting system relies on more implicit states in the HW, so going from token to counts requires uncovering such states from the given token alone. The lost information would the notion of an ordering of async operations (or commit groups, to be exact). Given only a token, we don’t know (1) how many other async-ops are in flight before that sync point and (2) how many of them can still be in-flight after (The latter one is required by PTX). I claim that this is a difficult problem in general, and I gave the MLIR bit as a data point. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
