mbs-octoml opened a new pull request #9038:
URL: https://github.com/apache/tvm/pull/9038


   Currently LowerTEPass (backend/te_compiler.cc) is a 'special' pass because it
   depends on a side-input DeviceMap. We'd like to remove that side-input, and
   instead recover the Device (and, ultimately, Target) for each (fused) 
primitive
   call from the AST alone.
   
   By doing so we also avoid needing to perform device planning twice:
    - It needs to be done before lowering so we know which primitives need
      to be compiled for which devices.
    - It then needs to be re-done after lowering and optimization as a prelude
      to memory planning.
   By baking the device plan into the AST we can simply do device planning 
before
   lowering, and run memory planning later, both as ordinary passes.
   
   While working on that issue we realized we currently have 3 'device 
planners':
    - transforms/device_annotation.cc, which supports only a small subset of 
Relay
      and uses a simple top-down algorithm to assign a device to every
      sub-expression.
    - analysis/context_analysis.cc, which makes a galant effort to support most 
of
      Relay, is based on unification rather than a top-down algorithm, but 
handles
      higher order functions by ad-hoc and fragile inlining.
    - transforms/annotate_target.cc, which works on Targets instead of Devices, 
but
      is otherwise like 'device planning'.
   We'd like to bring these together.
   
   In this PR we introduce a new transforms/device_planner.cc intended to 
replace
   transforms/device_annotation.cc and analysis/context_analysis.cc. We don't
   delete those two just yet since we need to switch all users off of them in 
the
   next PR. We also leave transforms/annotate_target.cc alone pending a proper 
RFC
   to bring devices and targets together sensibly, but have it firmly in our
   sights.
   
   transforms/device_planner.cc is based on analysis/context_analysis.cc, but
   is heavily reworked to:
    1. Handle starting from existing "on_device" annotations as well as existing
       "device_copy" calls.
    2. Be idempotent, with the idea we'll probably need to re-run it to 'refine'
       device planning to account for storge scopes.
    3. Robustly handle all of Relay, particularly higher-order functions. For 
that
       we replace the inlining approach in analysis/context_analysis.cc with a
       higher-order unification domain.
    4. Be a little more systematic with defaulting.
    5. Capture the result of the analysis within the AST as new "device_copy" 
calls
       at device boundaries, and new/replaced "on_device" calls wherever the 
device
       for a sub-expression is not already 'obvious' from the sub-expression's
       lexical scope.
    6. Provide helper visitors for passes which need to ask for the device for
       any sub-expression they are processing and/or preserve device information
       on rewrites. Those passes include:
        - backend/aot_executor_codegen.cc (AOTOnDemandAllocator)
        - backend/graph_plan_memory.cc (StorageAllocaBaseVisitor etc)
        - backend/te_compiler.cc (LowerTensorExprMutator)
        - backend/vm/lambda_lift.cc (LambdaLifter)
        - transforms/memory_alloc.cc (DialectRewriter)
        - transforms/to_a_normal_form.cc (Fill)
        - backend/vm/compiler.cc (VMFunctionCompiler)
       However we won't change any of those in this PR.
   
   See the draft #8788 for the end game, I'll be peeling PRs out of that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to