[GitHub] [tvm-rfcs] ccjoechou commented on pull request #48: [RFC][BYOC] Marvell ML/AI Accelerator Integration

GitBox Tue, 18 Jan 2022 16:28:38 -0800


ccjoechou commented on pull request #48:
URL: https://github.com/apache/tvm-rfcs/pull/48#issuecomment-1015958031



   @areusch - Thanks for replying.
   Please see my comments below regarding your questions.
   
   
     1.  Question: device planning, which I think maybe you're doing outside 
the typical TVM flow?
        *   If I understand your question correctly: Yes and no.
        *   We are using TVM relay flow to generate JSON meta files and IR 
sub-graphs of the network model
        *   We like to use TVM code-gen and runtime flow to generate binary to 
run inference for “llvm-part” of the network model
        *   But we also like to use our build-from-outside-TVM-flow, Marvell 
accelerator backend/code-gen component to generate binary for “Marvell-part” of 
the network model (to be run on Marvell accelerator)
                                                                  i.      Not 
right now, but in a future RFC, we can and like to provide APIs and library 
files so that we can embed Marvell backend/code-gen component into libtvm.so 
and within the typical TVM flow
   
        *   Not right now, but in a future RFC, when Marvell driver APIs and 
TVM-Marvell runtime & driver hookups are ready, we like to use the typical TVM 
flow (with Marvell modifications) to run “Marvell-part” computes of the network 
on Marvell HW accelerator directly and llvm-part of computes
     1.  Question: executor, which I think you may have re-implemented here.
        *   I believe that we implemented specializations of the current 
executor code in order to generate Marvell JSON meta files.
        *   It is possible that others may have also implemented “parts of” 
similar specializations in the last 6 months – if this is the case (and we can 
use them), we like to know how we can merge codebase
     2.  Question: to provide code links into your PoC if that would help me 
understand--I can do some targeted reading
        *   As listed in the RFC, our POC changes have been up-streamed to the 
TVM GitHub’s PR-9730.
   If you like to, you should also be able to git clone from 
https://github.com/ccjoechou/tvm.git and checkout & use the “byoc-mrvl” branch.
     3.  Question: it would also be great to spell out a plan for tests 
here--it seems like it might be possible to checkin your compiler/simulator 
into our CI, but could you be more explicit about your plans there?
        *   We have added infrastructure code and a test_mrvl suite to run the 
POC TVM-BYOC-Marvell flow
        *   Currently, there is a code-gen test, which can be run to use a 
pre-trained a ssd-resnet50 model - please see 
tvm/tests/python/contrib/test_mrvl/ test_mrvl_codegen.py and its 
test_ssd_resnet50_aot_json_codegen function
        *   Should also be able to run regular docker steps below to exercise 
the BYOC-Marvell flow to compile a ssd-resnet50 network to generate JSON meta 
files for Marvell accelerator:
   
   
     *   ./docker/bash.sh --name tvm_mrvl tlcpack/ci-cpu:v0.79 
./tests/scripts/task_config_build_cpu.sh
     *   ./docker/bash.sh --name tvm_mrvl tlcpack/ci-cpu:v0.79 
./tests/scripts/task_build.sh build -j10
     *   ./docker/bash.sh --name tvm_mrvl tlcpack/ci-cpu:v0.79 
./tests/scripts/task_ci_setup.sh
     *   ./docker/bash.sh --name tvm_mrvl tlcpack/ci-cpu:v0.79 
./tests/scripts/task_python_integration.sh
   
   
        *   For the last task_python_integration.sh suite, can edit the file in 
order to skip steps to run other test suites but focus on running only 
tests/python/contrib/test_mrvl:
   sudo pip3 install gluoncv
   run_pytest ctypes ${TVM_INTEGRATION_TESTSUITE_NAME}-contrib 
tests/python/contrib/test_mrvl
   
   Pleas also see my comments in-line below.
   Let me raise a difference here:
   
     *   The TVM partition’s sub-graph seems to represent a relay function, 
which can include multiple frontend operators captured by utilizing the relay 
merge-composite patten
     *   The Marvell sub-graph is a connected graph of multiple relay 
merge-composite functions – I did not know how to include a Figure in the RFC 
file before (now I do). But if you look at the listed pre-RFC link, we did 
include figures at end of the corresponding pre-RFC on the discuss forum – 
please check the end of pre-RFC and its figure to see whether they can help 
explaining the definition of Marvell sub-graphs here. 
https://discuss.tvm.apache.org/t/pre-rfc-byoc-marvell-ml-ai-accelerator-integration/11691].
   
   Thanks again and please let us know, if you like to discuss more.
   
   
     *   Joe
   
   
   
   
   
   From: Andrew Reusch ***@***.***>
   Sent: Tuesday, January 18, 2022 12:48 PM
   To: apache/tvm-rfcs ***@***.***>
   Cc: Joe Chou ***@***.***>; Mention ***@***.***>
   Subject: [EXT] Re: [apache/tvm-rfcs] [RFC][BYOC] Marvell ML/AI Accelerator 
Integration (PR #48)
   
   External Email
   ________________________________
   
   @areusch requested changes on this pull request.
   
   
@ssdurako<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ssdurako&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=d2Phi5OhJYC30chQlDSTXGdBUHLcKOytTeAstHx3XVU&e=>
 
@ccjoechou<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ccjoechou&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=WZdEJI-EoLFqhJJcO9YZeCcSYSSC-s48_LgSB5F5K1c&e=>
 apologies for the long delay! i think we missed this one since it was mailed 
during TVMCon and also just before we all took off for the holidays. I'll try 
to be a bit better about reviewing this.
   
   Overall I have some understanding of your approach with this RFC. I'd like 
to further discuss some of the rationale behind:
   
     *   device planning, which I think maybe you're doing outside the typical 
TVM flow?
     *   executor, which I think you may have re-implemented here.
   
   I'm a bit low on bandwidth to read your full PoC PR. would you mind 
clarifying the RFC as a starting point (or feel free to provide code links into 
your PoC if that would help me understand--I can do some targeted reading, I'm 
just fairly busy for a full read-through right now)
   
   it would also be great to spell out a plan for tests here--it seems like it 
might be possible to checkin your compiler/simulator into our CI, but could you 
be more explicit about your plans there?
   
   also cc 
@comaniac<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_comaniac&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=5likf9V3g5a6XmvwPAr-ais9kJ9tL1Oe3UCcdIReMIQ&e=>
 
@mbs-octoml<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mbs-2Doctoml&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=e3YsFOFCwaZu1YZYwMB9BPZ3m6b0Ne-2byTRo1cYgvQ&e=>
 
@Mousius<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Mousius&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=mqoZGl7ZzAvBheq3qBAww4oxaItbLwRtTPxO_jkI7w4&e=>
 
@junrushao1994<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_junrushao1994&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_
 
xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=MuKxC3uDBMZAx2b6eewBCqm_vk3zVN29bCKxGNYf8a0&e=>
 for further comments on BYOC, device planning, and support for custom executors
   
   ________________________________
   
   In 
rfcs/0048-BYOC-Marvell-ML-accelerator-integration.md<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_tvm-2Drfcs_pull_48-23discussion-5Fr787106344&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=R7fm2C2KsgY7__XRfP-OLUzHCTDIV2vV74gcKzdXtns&e=>:
   
   > +      conv2d + add + batch_norm + tuple.getitem(0) + relu
   
   +    * For the first Marvell-BYOC revision, at most one for-accelerator Mrvl 
subgraph and at most one for-TVM-target
   
   +      non-Mrvl subgraph (let's call this sub-graph B) can be identified; 
plus, the for-accelerator Mrvl subgraph can
   
   +      only use input tensor(s) of given pre-trained network as its 
subgraph’s input tensors
   
   +
   
   +* Do code-gen step for each for-accelerator Mrvl subgraph:
   
   +    * Marvell-BYOC-specific attributes are introduced for each 
composite-merged/fused Call node so that a Nodes-JSON
   
   +      file and a Constants-JSON file are produced for the Mrvl subgraph
   
   +
   
   +STEP (2) Run Mrvl-ML/AI Backend Compiler to generate model binary for each 
Mrvl subgraph
   
   +
   
   +* The Mrvl-ML/AI backend compiler will be distributed as an executable in 
the OCTEON SDK; and it can be used to read
   
   +  in Nodes-JSON and Constants-JSON files of each Mrvl subgraph as input 
meta-data in order to generate final instructions,
   
   +  in model binary file
   
   +
   
   +* Note: Mrvl-ML/AI backend compiler, which does accelerator-specific 
optimization and code generation, is not included
   
   what's the test plan for this RFC? Would it be possible to add the Marvell 
backend compiler and simulator to our ci images and run against it in CI?
   [ccjoechou writes: for this BYOC-Marvell RFC, the POC PR codebase only 
contains code to generate JSON meta files. We have up-streamed our test_mrvl 
test suite but only contains JSON codegen. In our next RFC, we will provide 
runtime & driver hookups. We are working on a Marvell backend package with 
Marvell backend code-gen and Marvell software simulator, which mimics a 
cycle-approximate Marvell HW accelerator. This package can become available 
later for external usage.
   Currently, we are having problems run TVM rust/cargo and can’t find useful 
document to debug issues – plus, tvm-build is owned by OctoML (not GitHub TVM, 
right?)]
   
   ________________________________
   
   In 
rfcs/0048-BYOC-Marvell-ML-accelerator-integration.md<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_tvm-2Drfcs_pull_48-23discussion-5Fr787107277&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=dTFGJaoaEmF7zEr2T3eNCq4rM8GuJUGIcBCKJwUjp7I&e=>:
   
   > +STEP (2) Run Mrvl-ML/AI Backend Compiler to generate model binary for 
each Mrvl subgraph
   
   +
   
   +* The Mrvl-ML/AI backend compiler will be distributed as an executable in 
the OCTEON SDK; and it can be used to read
   
   +  in Nodes-JSON and Constants-JSON files of each Mrvl subgraph as input 
meta-data in order to generate final instructions,
   
   +  in model binary file
   
   +
   
   +* Note: Mrvl-ML/AI backend compiler, which does accelerator-specific 
optimization and code generation, is not included
   
   +  to upstream
   
   +
   
   +STEP (3a) or (3b) Run inference on the software Simulator or on the Mrvl 
ML/AI HW accelerator for the Mrvl subgraph
   
   +
   
   +* The Mrvl Software Simulator of the Mrvl ML/AI HW accelerator will be 
distributed as an executable in a Mrvl-ML/AI tar
   
   +  ball; and it can be used to read in input file(s) and the model binary to 
run inference for the Mrvl subgraph
   
   +
   
   +* Note: Mrvl ML/AI accelerator can run inference in either float16 mode or 
int8 quantization mode. For this RFC, we will
   
   +  focus only on float16 inference run
   
   just checking if this was the end of the sentence here
   [ccjoechou writes: yes]
   
   ________________________________
   
   In 
rfcs/0048-BYOC-Marvell-ML-accelerator-integration.md<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_tvm-2Drfcs_pull_48-23discussion-5Fr787114611&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=ENAAXEINGM7EwToNOLBW3SyS_1du7HQF34XpbsKcVwc&e=>:
   
   > +          /* en_id=599 */) /* ty=Tensor[(256, 1568), float32] */;
   
   +      %11 = nn.dense(%9, %10, units=None, out_dtype="float32", /* en_id=600 
*/) /* ty=Tensor[(1, 256), float32] */;
   
   +      %12 = add(%11, meta[relay.Constant][5] /* ty=Tensor[(256), float32] 
*/,
   
   +          /* en_id=601 */) /* ty=Tensor[(1, 256), float32] */;
   
   +      %13 = nn.relu(%12, /* en_id=602 */) /* ty=Tensor[(1, 256), float32] 
*/;
   
   +      %14 = transpose(meta[relay.Constant][6] /* ty=Tensor[(256, 10), 
float32] */, axes=[1, 0],
   
   +          /* en_id=675 */) /* ty=Tensor[(10, 256), float32] */;
   
   +      %15 = nn.dense(%13, %14, units=None, out_dtype="float32", /* 
en_id=676 */) /* ty=Tensor[(1, 10), float32] */;
   
   +      add(%15, meta[relay.Constant][7] /* ty=Tensor[(10), float32] */, /* 
en_id=677 */) /* ty=Tensor[(1, 10), float32] */
   
   +}
   
   +
   
   +```
   
   +
   
   +* We can get to the following one Mrvl subgraph by applying the default 
strategy.
   
   +    * in the mrvl.py file: the compute_two_subgraphs() function of the 
class MrvlIRGraphUtils is used
   
   +      to create mod_mrvl_subgraph and mod_non_mrvl_subgraph for
   
   could you clarify this sentence?
   
   [ccjoechou writes: did not know how to include a Figure in the RFC file – 
but I did include figures at end of the corresponding pre-RFC on the discuss 
forum – please check the end of pre-RFC and its figure to see whether they can 
help explaining the definition of Marvell sub-graphs here. 
https://discuss.tvm.apache.org/t/pre-rfc-byoc-marvell-ml-ai-accelerator-integration/11691]
   
   
   
   ________________________________
   
   In 
rfcs/0048-BYOC-Marvell-ML-accelerator-integration.md<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_tvm-2Drfcs_pull_48-23discussion-5Fr787115633&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=NMd2x1139DrgTciUNgZ8CD9PiwmazbEys_2ZMegmtEU&e=>:
   
   > +      %13 = nn.relu(%12, /* en_id=602 */) /* ty=Tensor[(1, 256), float32] 
*/;
   
   +      %14 = transpose(meta[relay.Constant][6] /* ty=Tensor[(256, 10), 
float32] */, axes=[1, 0],
   
   +          /* en_id=675 */) /* ty=Tensor[(10, 256), float32] */;
   
   +      %15 = nn.dense(%13, %14, units=None, out_dtype="float32", /* 
en_id=676 */) /* ty=Tensor[(1, 10), float32] */;
   
   +      add(%15, meta[relay.Constant][7] /* ty=Tensor[(10), float32] */, /* 
en_id=677 */) /* ty=Tensor[(1, 10), float32] */
   
   +}
   
   +
   
   +```
   
   +
   
   +* We can get to the following one Mrvl subgraph by applying the default 
strategy.
   
   +    * in the mrvl.py file: the compute_two_subgraphs() function of the 
class MrvlIRGraphUtils is used
   
   +      to create mod_mrvl_subgraph and mod_non_mrvl_subgraph for
   
   +
   
   +```
   
   +    def @main(%permute_input: Tensor[(1, 1, 28, 28), float32]) -> 
Tensor[(1, 10), float32] {
   
   +      %0 = @tvmgen_mrvl_main_0(%permute_input, /* en_id=4136 */) /* 
ty=Tensor[(1, 28, 28, 1), float32] */;
   
   above, the RFC discusses having exactly one Marvell and non-Marvell 
subcgraph, but here I see 8 different function calls. do you mean that there 
are two targets, and you partition the graph into 8 subgraphs, but each 
subgraph is assigned to one or the other target? (reading further, I can see 
this is not the case, but it would help with reader comprehension to clarify 
this example)
   
   [ccjoechou writes: We are talking about different definitions of 
“(sub-)graphs” here. In the TVM partition pass, TVM’s graph or sub-graph is a 
merge-composite IR function, which can contain a pre-define pattern of original 
frontend operators. In BYOC-Marvell RFC’s definition, a sub-graph is a 
connected graph of Marvell-merge-composite functions. For instance, a 
tvmgen_mrvl_main_4 (see below in original email), it is a TVM-partition 
sub-graph, which is a Marvell merge-composite function containing frontend 
operators: conv, add, batchnorm, tuple-get-item, relu. But a Marvell sub-graph 
contains, in the given test case, several Marvell merge-composite functions.]
   
   ________________________________
   
   In 
rfcs/0048-BYOC-Marvell-ML-accelerator-integration.md<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_tvm-2Drfcs_pull_48-23discussion-5Fr787117464&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=9V5lRxYmfESbgL3yM5Ilb_xMOPZt-l7-zSzW5591Rko&e=>:
   
   > +      %3 = @tvmgen_mrvl_main_3(%2, /* en_id=4139 */) /* ty=Tensor[(1, 14, 
14, 32), float32] */;
   
   +      %4 = @tvmgen_mrvl_main_4(%3, /* en_id=4140 */) /* ty=Tensor[(1, 7, 7, 
32), float32] */;
   
   +      %5 = @tvmgen_mrvl_main_5(%4, /* en_id=4141 */) /* ty=Tensor[(1, 
1568), float32] */;
   
   +      %6 = @tvmgen_mrvl_main_6(%5, /* en_id=4142 */) /* ty=Tensor[(1, 256), 
float32] */;
   
   +      @tvmgen_mrvl_main_7(%6, /* en_id=4143 */) /* ty=Tensor[(1, 10), 
float32] */
   
   +    }
   
   +```
   
   +
   
   +* In the above Mrvl subgraph, it is formed by "not-yet optimized Marvell 
(backend) layers". For example,
   
   +    tvmgen_mrvl_main_0 to tvmgen_mrvl_main_7 are composited/fused Marvell 
layers.
   
   +    * In the mrvl.mrvl_pattern_table() function, fusing patterns have been 
defined in order to composite
   
   +      original IR nodes into Marvell backend layers.
   
   +    * For example, the following 3 IR call nodes (nn.conv2d + nn.bias_add + 
nn.relu) in the original IR graph
   
   +      are composited into one Marvell layer: tvmgen_mrvl_main_1, 
conceptually speaking.
   
   +```
   
   +      # from original IR graphs
   
   this process looks rather similar to the device planning pass used in 
tvm.relay.build. are they the same? if not, could you motivate why you don't 
want to reuse that one?
   [ccjoechou: sorry I am not sure what you meant by “device planning pass”? We 
have been following what others did in tvm/python/tvm/relay/op/contrib by 
utilizing relay passes (for example, ConvertLayout, MergeComposite, 
AnnotateTarget, and etc.). Please note that in this RFC, we only want to 
generate JSON meta files and we are not ready to propose/up-stream our runtime 
& driver hookups yet).]
   
   ________________________________
   
   In 
rfcs/0048-BYOC-Marvell-ML-accelerator-integration.md<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_tvm-2Drfcs_pull_48-23discussion-5Fr787129775&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=mipKEVzGwI-kUn_2n_Pb71FU6tS4AsBlfYeJzyoNlJg&e=>:
   
   > +
   
   +      %0 = @tvmgen_mrvl_main_0(%permute_input, /* en_id=4136 */) /* 
ty=Tensor[(1, 28, 28, 1), float32] */;
   
   +      %1 = @tvmgen_mrvl_main_1(%0, /* en_id=4137 */) /* ty=Tensor[(1, 28, 
28, 64), float32] */;
   
   +      %2 = @tvmgen_mrvl_main_2(%1, /* en_id=4138 */) /* ty=Tensor[(1, 14, 
14, 64), float32] */;
   
   +      %3 = @tvmgen_mrvl_main_3(%2, /* en_id=4139 */) /* ty=Tensor[(1, 14, 
14, 32), float32] */;
   
   +      %4 = @tvmgen_mrvl_main_4(%3, /* en_id=4140 */) /* ty=Tensor[(1, 7, 7, 
32), float32] */;
   
   +
   
   +      def @tvmgen_mrvl_main_0(%mrvl_0_i0: Tensor[(1, 1, 28, 28), float32], 
Inline=1, Compiler="mrvl",
   
   +          global_symbol="tvmgen_mrvl_main_0", Primitive=1) -> Tensor[(1, 
28, 28, 1), float32] {
   
   +        layout_transform(%mrvl_0_i0, src_layout="NCHW", dst_layout="NHWC",
   
   +            /* en_id=3334 */) /* ty=Tensor[(1, 28, 28, 1), float32] */
   
   +      }
   
   +```
   
   +
   
   +* Currently, in order for the following Marvell classes/functions to 
identify a Mrvl subgraphs and a non-Mrvl
   
   +  subgraph from the layout-converted, composited/fused IR graph, we are 
utilizing the unique en_id attribute
   
   could you motivate the naming of en_id a bit? i recognize this is a common 
thing, but it might be nice to choose a slightly more specific name
   [ccjoechou writes: en_id as ExprNode ID. It is an extra field, which has 
been defined in the include/./tvm/ir/expr.h file for the RelayExprNode or just 
ExprNode class.]
   
   ________________________________
   
   In 
rfcs/0048-BYOC-Marvell-ML-accelerator-integration.md<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_tvm-2Drfcs_pull_48-23discussion-5Fr787130884&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=Ew6gvJZix-z7H4QsP5VlpdhjQ4fJ-0-iezoY2L2FyAk&e=>:
   
   > +            mod = seq(mod)
   
   +        return mod
   
   +
   
   +    mod_new = tvm.IRModule(mod_mrvl.functions, mod_mrvl.type_definitions)
   
   +    mod_new["main"] = MrvlSubgraphToRevert(mrvl_layers_in_mrvl_subgraph, 
mod_mrvl).visit(mod_mrvl["main"])
   
   +    mod_new = relay.transform.RemoveUnusedFunctions()(mod_new)
   
   +    mod_new = relay.transform.InferType()(mod_new)
   
   +    mod_new = run_opt_pass(mod_new, relay.transform.DefuseOps())
   
   +    mod_new = run_opt_pass(mod_new, 
relay.transform.ConvertLayout({"nn.conv2d": ["NCHW", "OIHW"], "nn.max_pool2d": 
["NCHW"]}))
   
   +    mod_new = run_opt_pass(mod_new, relay.transform.SimplifyExpr())
   
   +    mod_new = run_opt_pass(mod_new, 
relay.transform._ffi_api.DropNoopTranspose())
   
   +    mod_new = run_opt_pass(mod_new, relay.transform.InferType())
   
   +    return mod_new
   
   +```
   
   +
   
   +* Marvell-specific graph executor codegen, We have defined call backs and 
extension functions in the following files:
   
   could you motivate this further? it's hard to understand why you need to 
output your own JSON format without some explanation here.
   [ccjoechou writes: the above code block is not to output our own JSON 
format; instead, it is to “revert” a sub-graph, which went over Marvell passes 
(e.g., ConvertLayout, MergeComposite, AnnotateTarget, and etc), back to its 
original, say, llvm-IR graph. Hence, we are: defuse-ops (opposing to 
Merge-Composite), reverted ConvertLayout, and etc. Motivation for reverting 
back to this llvm-part subgraph is to allow this llvm-part subgraph to go 
through TVM llvm-flow to generate runtime binary.]
   
   ________________________________
   
   In 
rfcs/0048-BYOC-Marvell-ML-accelerator-integration.md<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_tvm-2Drfcs_pull_48-23discussion-5Fr787131656&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=rToD4fZ9EIg8eIkLKttKsDNIv0v45-SouZi0No4Qt5g&e=>:
   
   > +```
   
   +
   
   +* the need to link between pre-trained model and final Marvell backend 
layer - for instance, through tvm_custom
   
   +    * We did not include prototype code in PR-9730 but intend to provide 
our sample changes in another RFC and PR.
   
   +
   
   +
   
   +# Drawbacks
   
   +[drawbacks]: #drawbacks
   
   +
   
   +* We haven't identified any major *not* do items. Several other designs are 
by choices - that is we understand that
   
   +  there are benefits for doing or benefits for not-doing.
   
   +
   
   +# Rationale and alternatives
   
   +[rationale-and-alternatives]: #rationale-and-alternatives
   
   +
   
   +* We follow the TVM BYOC framework to enable BYOC Marvell flow without 
impacting any TVM core features.
   
   it seems like there has been some impact to the GraphExecutor, and I think 
one point of confusion here is whether it was necessary to do that or whether 
you could have handled the additional runtime complexity inside a 
Marvell-specific runtime.Module. could you explain a bit further here?
   
   [ccjoechou writes: I do not see GraphExecutor term in above. Please provide 
an example or point us to a TVM file so we can understand your comment a bit 
more. Thanks.]
   
   ________________________________
   
   In 
rfcs/0048-BYOC-Marvell-ML-accelerator-integration.md<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_tvm-2Drfcs_pull_48-23discussion-5Fr787132217&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=JRaYKexNQLso0O88polwIXQyGMTNHhdyLbV1bKRfSrM&e=>:
   
   > +  bypassing the tvm/Jenkinsfile's tests/scripts/task_rust.sh step. We 
will need help to re-enable the step.
   
   +
   
   +* We like to duplicate the Jenkins environment in order to run 
tvm/Jenkinsfile as is, but, we ran into many issues.
   
   +  Currently, we have a tvm-like Jenksinsfile environment to only run a 
subset of test suites using a modified
   
   +  Jenkinsfile.
   
   +
   
   +* We have identified a need to allow a call-back function to be registered 
when generating Mrvl-BYOC-specific
   
   +  Nodes-JSON file. We are trying to follow TVM Python/CPP-CB style as much 
as possible. But, since our callback
   
   +  function 
tvm/src/relay/backend/contrib/mrvl/graph_executor_codegen_mrvl.cc::GetExternalJSON()
 function is using
   
   +  non-simple argument types, we need help from TVM community to provide 
suggestions/guidelines in order to make
   
   +  new CB code better to meet TVM community requirements here.
   
   +
   
   +* For one Mrvl-BYOC relay transformation pass, we have identified a need to 
inject a (global) expr node ID for the
   
   +  RelayExprNode class and its derived classes: Tuple and CallNode, so that 
during the transformation pass, we can
   
   +  uniquely identify each Tuple or CallNode object. Again, we need help from 
TVM community to provide
   
   +  suggestions/guidelines here in order to know whether this is one of the 
best ways to achieve the Mrvl-BYOC need.
   
   i think it would help to spell out why you guys need to be able to identify 
each expression here.
   
   [ccjoechou writes: yes but not just us but a data-scientist customer who is 
using TVM flow may like to know, for example, the linkages between the runtime 
performance #s and their corresponding user frontend model’s operators (e.g., 
each expression).]
   
   ________________________________
   
   In 
rfcs/0048-BYOC-Marvell-ML-accelerator-integration.md<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_tvm-2Drfcs_pull_48-23discussion-5Fr787132545&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=dIcIPdSahtIX9piD7O2f_UhKljEWi-K6XvL8MI3svgY&e=>:
   
   > +  RelayExprNode class and its derived classes: Tuple and CallNode, so 
that during the transformation pass, we can
   
   +  uniquely identify each Tuple or CallNode object. Again, we need help from 
TVM community to provide
   
   +  suggestions/guidelines here in order to know whether this is one of the 
best ways to achieve the Mrvl-BYOC need.
   
   +
   
   +* We also identified a need to maintain linkages between 
(operator-)information described in the original, given
   
   +  pre-trained network model and the code-gen JSON files so that the 
compiler backend will be able to report user-level
   
   +  (e.g., meaningful-to-user) messages regarding the given pre-trained 
network. For instance, in the
   
   +  tvm/python/tvm/relay/frontend/onnx.py and common.py files, we can see 
user-level information being captured using
   
   +  “tvm_custom” related code as in original onnx.py file for the given 
pre-trained network; but, in common.py, the code
   
   +  later drops the linkage, via attrs.pop(“tvm_custom”), and does not pass 
the linkage onto the initial relay IR graph.
   
   +  We have a draft solution to maintain linkages between the given 
pre-trained network model and its relay IR graph
   
   +  (using expr node ID and tvm custom ID, plus, a few utility functions), 
but would like to know whether the TVM
   
   +  community has any better or work-in-progress resolution.
   
   +
   
   +* When using TVM RPC code to exercise and run inference on a remote-hosted 
Mrvl ML/AI HW accelerator for the Mrvl
   
   +  subgraph, we ran into one minor issue and have made local TVM RPC 
enhancement so that, when a TVM RPC client sends
   
   could you explain the nature of the problem that requires the client to know 
the absolute path?
   
   [ccjoechou writes: First, the TVM RPC server choses path for any uploaded 
file under tmp randomly (which can be good to reduce possible security 
problem). But, in our use case, we like to have the TVM RPC client to send a 
“runtime” command to the RPC server side to pre-process the just uploaded file 
before the file can be consumed autonomously by the RPC server using 
pre-defined script. We can’t find a way or via a TVM example, which shows how 
this can be done.]
   
   ________________________________
   
   In 
rfcs/0048-BYOC-Marvell-ML-accelerator-integration.md<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_tvm-2Drfcs_pull_48-23discussion-5Fr787133542&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=9sG9W_t_KkSawUyjosauynkwCGnYieI8qZo2TOrwXQU&e=>:
   
   > +  pre-trained network model and the code-gen JSON files so that the 
compiler backend will be able to report user-level
   
   +  (e.g., meaningful-to-user) messages regarding the given pre-trained 
network. For instance, in the
   
   +  tvm/python/tvm/relay/frontend/onnx.py and common.py files, we can see 
user-level information being captured using
   
   +  “tvm_custom” related code as in original onnx.py file for the given 
pre-trained network; but, in common.py, the code
   
   +  later drops the linkage, via attrs.pop(“tvm_custom”), and does not pass 
the linkage onto the initial relay IR graph.
   
   +  We have a draft solution to maintain linkages between the given 
pre-trained network model and its relay IR graph
   
   +  (using expr node ID and tvm custom ID, plus, a few utility functions), 
but would like to know whether the TVM
   
   +  community has any better or work-in-progress resolution.
   
   +
   
   +* When using TVM RPC code to exercise and run inference on a remote-hosted 
Mrvl ML/AI HW accelerator for the Mrvl
   
   +  subgraph, we ran into one minor issue and have made local TVM RPC 
enhancement so that, when a TVM RPC client sends
   
   +  a file to the remote server, the TVM RPC client can know where the remote 
server saves the file on the remote machine.
   
   +  Since this is not directly related to this Mrvl-BYOC PR, we will find 
time to contribute this enhance back in another
   
   +  TVM PR soon.
   
   +
   
   +* In order for us to generate the constants-JSON file, we must “NOT” remove 
external params, which were stored in
   
   why is this? params passed in MetadataModule are meant for consumption only 
by the runtime.Module which defines them. it seems like perhaps you need to 
consume them at the executor level. could you explain that?
   
   [ccjoechou writes: We are using relay to generate JSON meta files 
representing the given network model in a way our backend code can process 
directly (e.g., only the marvell-part sub-graph(s). If we have include 100% of 
our backend code in the TVM codebase, then, we do not need to dump constants in 
JSON mega file; but, due to our backend code is built outside the typical TVM 
flow and can do other compile-time optimization including manipulating 
constants, we need constants JSON.]
   
   —
   Reply to this email directly, view it on 
GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_tvm-2Drfcs_pull_48-23pullrequestreview-2D855912667&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=L2xG3uKvJ4yJ_ixp1fyRD1bRTpZgR-yWZtSVJPsEHLo&e=>,
 or 
unsubscribe<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AM636PACHANYNAATB4KA7B3UWXGYFANCNFSM5KJYF53A&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=NB9WVT4nlWSlIeHit8jurEkvilrrj22R7iAhBtdylbI&e=>.
   Triage notifications on the go with GitHub Mobile for 
iOS<https://urldefense.proofpoint.com/v2/url?u=https-3A__apps.apple.com_app_apple-2Dstore_id1477376905-3Fct-3Dnotification-2Demail-26mt-3D8-26pt-3D524675&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=-StSSo5XOXXHK4bMwp_B8eLiRGQ0DXvLD_42vrZOwyQ&e=>
 or 
Android<https://urldefense.proofpoint.com/v2/url?u=https-3A__play.google.com_store_apps_details-3Fid-3Dcom.github.android-26referrer-3Dutm-5Fcampaign-253Dnotification-2Demail-2526utm-5Fmedium-253Demail-2526utm-5Fsource-253Dgithub&d=DwMFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=84zI-iH2xz28Q_xujSGbjYaq38CgcRgl_vGzCtF6TwQ&m=VNqzldKdQt96drtn_9Nfv7pyo4q1twvxtd6wMIR7FYI4cpvEKxaaVtBsanbms18J&s=HJ1qy1TGjh0KDSnnPWyF-MoI-UYA3X9I_er2DAq4ICs&e=>.
   You are receiving this because you were mentioned.Message ID: ***@***.***>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm-rfcs] ccjoechou commented on pull request #48: [RFC][BYOC] Marvell ML/AI Accelerator Integration

Reply via email to