[GitHub] [tvm-rfcs] tqchen commented on a diff in pull request #97: [RFC] Further Unify Packed and Object in TVM Runtime

GitBox Mon, 09 Jan 2023 10:07:07 -0800


tqchen commented on code in PR #97:
URL: https://github.com/apache/tvm-rfcs/pull/97#discussion_r1064942274



##########
rfcs/0097-unify-packed-and-object.md:
##########
@@ -0,0 +1,712 @@
+Authors: @cloud-mxd, @junrushao,  @tqchen
+
+- Feature Name: Further Unify Packed and Object in TVM Runtime
+- Start Date: 2023-01-08
+- RFC PR: [apache/tvm-rfcs#0097](https://github.com/apache/tvm-rfcs/pull/97)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+## Summary
+
+This RFC proposes to further unify our PackedFunc and Object in TVM Runtime. 
Key improvements include: unifying `type_code`, solidifying AnyValue support 
for both stack and object values, open doors for small-string and 
NLP-preprocessing, and enable universal container.
+
+## Motivation
+
+FFI is one of the main components of the TVM. We use PackedFunc convention to 
safely type-erase values and pass things around. In order to support a general 
set of data structures both for compilation purposes, we also have an Object 
system, which is made to be aware in the Packed API. 
+
+Object supports reference counting, dynamic type casting, and checking as well 
as structural equality/hashing/serialization in the compiler.
+Right now, most of the things of interest are Object, including containers 
like Map, Array. PackedFunc itself, Module, and various IR objects.
+Object requires heap allocation and reference counting, which can be optimized 
through pooling. They are suitable for most of the deep learning runtime needs, 
+such as containers, as long as they are infrequent.
+In the meantime, we still need to operate with values on the stack. 
Specifically, when we pass around int, and float values. 
+It can be wasteful to invoke heap allocations/or even pooling if the 
operations are meant to be low cost. As a result, the FFI mechanism also serves 
additional ways to be able to pass **stack values** directly around without 
object.
+
+This post summarizes lessons from us and other related projects and needs 
around the overall TVM FFI and Object system. And seek to use these lessons to 
further solidify the current system. We summarize some of the needs and 
observations as follows:
+
+### N0: First class stack small string and AnyValue
+
+Data preprocessing is an important part of ML pipeline. Preprocessing in NLP 
involves strings and containers. Additionally, when translating programs 
written by users (in python), there may not be sufficient type annotations. 
+
+The programs below comes from real production scenario code from matxscript in 
NLP Preprocessing:
+
+```cpp
+// This can be part of data processing code translated 
+// from user that comes without type annotation
+AnyValue unicode_split_any(const AnyValue& word) {
+  List ret;
+  for (size_t i = 0; i < word.size(); ++i) {
+     AnyValue res = word[i];
+     ret.push_back(res);   
+  }
+  return ret;
+}
+// This is a better typed execution code
+// Note that word[i] returns a UCS4String container to match python semantics 
+// Use UCS4String stores Unicode in a fixed-length 4 bytes value to ease random
+// access to the elements. 
+List<UCS4String> unicode_split(const UCS4String& word) {
+  List<UCS4String> ret;
+  for (size_t i = 0; i < word.size(); ++i) {
+     UCS4String res = word[i];
+     ret.push_back(res);   
+  }
+  return ret;
+}
+```
+We would like to highlight a few key points by observing the above programs: 
+- Need a base AnyValue to support both stack values and object.
+    - This is to provide a safety net of translation.
+- The AnyValue needs to accommodate small-string(on stack) to enable fast 
string processing. Specifically, note that the particular example creates a 
`UCS4String res` for every character of the word. If we run heap allocation for 
each invocation, or even do reference countings, this can become expensive. The 
same principle also generalizes to the need to accommodate fast processing of 
other on-stack values. 
+
+
+While it is possible to rewrite the program through stronger typing and get 
more efficient code. It is important to acknowledge the need to efficient 
erased runtime support (with minimum overhead), especially given many ML user 
comes from python.
+
+### N1: Universal Container
+
+In the above example, it is important to note that the container `List` should 
hold any values. While it is possible to also provide different variant of 
specialized containers(such as `vector<int>`), to interact with a language like 
python, it would be nice to have a single universal container across the 
codebase. We also experienced similar issues in our compilation stack. As an 
example, while it is possible to use Array to hold IR nodes such as Expr, we 
cannot use it to hold POD int values, or other POD data types such as 
DLDataType.
+
+Having an efficient universal container helps to simplify conversions across 
language as well. For example, a list from python will be able to be turned 
into a single container without worrying about content type. The execution 
runtime will also be able to directly leverage the universal container to 
support all possible cases that a developer might write. 
+
+### N2: Further Unify POD Value, Object and AnyValue
+
+TVM currently does have an AnyValue. Specifically `TVMRetValue` is used to 
hold managed result for C++ PackedFunc return and can serve as AnyValue. 
Additionally, if the value is an object. `ObjectRef` serves as a nice way that 
comes with various mechanisms, including structural equality hashing.
+We can adopt a process processing called 
[boxing](https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/types/boxing-and-unboxing)
 that enables most of the runtime container to store values as object.
+If we create Boxed Object for each stack values, e.g. Integer to represent 
int. We will be able to effectively represent every value in Object as well.
+Both TVMRetValue and Object leverages a code field in the beginning of the 
data structure to identify the type. TVMRetValue’s code is statically assigned, 
Object’s code contains a statically assigned segment for runtime objects and 
dynamically assigned (that are indexed by type_key) for other objects.
+
+There are two interesting regimes of operation that comes with ObjectRef and 
AnyValue.
+
+- R0: On one hand, if we are operating on the regime of no need for frequent 
stack value operations. It is desirable to simply use Object. Because object is 
more compact on register (the size of ptr, which costs 8 bytes on modern 64 bit 
machines and 4 bytes on 32 bit machines), it can obtain underlying container 
pointers easily for weak references
+    
+    ```cpp
+    void ObjectOperation(ObjectRef obj) {
+      if (auto* IntImmNode int_ptr = obj.as<IntImmNode>()) {
+        LOG(INFO) << int_ptr->value;
+      }
+    }
+    ```
+    
+- R1: On the other hand, when we operate on frequent processing that is also 
not well-typed (as the `unicode_split` example). It is important to also 
support a AnyValue that comes with stack value support.
+
+As a point of reference, python use object as base for everything. But that 
indeed creates the overhead for str, int (which we seek to eliminate). Java and 
C# support both stack values, and their object counter part. 
+Right now we have both mechanism. It would be **desirable to further unify the 
Object and AnyValue** to support both R0 and R1. Additionally, it would be nice 
to have automatic conversions if we decide that two mechanisms are supported. 
Say a caller pass in a boxed int value, the callee should be able to easily get 
int out from it(or treat it as an int) without having to do explicit casting. 
So the same routine can be implemented via either R0 or R1 that is transparent 
to the caller.
+
+- This is also important for compilers and runtimes, as different compiler and 
runtime might have their own considerations operating under R0/R1.
+
+## Guide-level explanation and Design Goals
+
+We have the following design goals:
+
+- G0: Automatic switching between object focused scenario and stack-mixed that 
requires AnyValue.
+- G1: Enable efficient string processing, specifically small-string support 
for NLP use-cases.
+- G2: Enable efficient universal container (e.g common container for 
List/Array that stores everything).
+  - Note that it does not prevent us to create specalized code such as 
`List<String>` as java do, except that 
+    they still share the same underlying container.
+  - Array will share the same container with List to avoid conversion cost.
+- G3: Reduce concept duplication(type_code) and provide an unify approach for 
POD values and object values(including boxing and unboxing)
+
+```cpp
+// First class any value
+AnyValue unicode_split_any(const AnyValue& word) {
+  // universal container
+  List ret;
+  for (size_t i = 0; i < word.size(); ++i) {
+     // efficient small string support
+     AnyValue res = word[i];
+     ret.push_back(res);   
+  }
+  return ret;
+}
+
+// Unify object and POD value handling
+// passing an boxed int object to int function and get out int 
+// automatically without conversion
+int MyIntFunc(AnyValue x) {
+  int xval = x;
+  return x+1;
+}
+
+int Caller(Map<String, BoxInt> dict) {
+  BoxInt x = dict["x"];
+  return MyIntFunc(x);
+}
+```
+
+Most of the goals are demonstrated in the above example program. We will 
outline the detailed design in the next section.
+
+## Reference-level Implementation
+
+This section outlines the main design points. We also list design choices and 
discuss the recommended choices in the rationales and alternative section.
+
+### D0: Key Data Structures
+
+The program below gives an outline of the overall data structure choices.
+
+```cpp
+
+// Object is the same as the current object
+// We list it here for reference
+struct Object {
+  // 4 bytes type code
+  // This is a common header with AnyPODBase_
+  int32_t type_code;
+  // 4 bytes ref counter 
+  RefCounterType<int32_t> ref_counter;
+  // 8 bytes deleter
+  typedef void (*FDeleter)(Object* self);
+  FDeleter deleter; 
+  // Rest of the sections.
+};
+
+// Common value of Any
+struct AnyPODBase_ {
+  // type code, this is a common header with Object.
+  int32_t type_code;
+  // 4 bytes padding can be used to store a number of bytes in small str
+  int32_t small_len;
+  // 8 bytes field storing variant
+  // v_handle can be used to store Object*
+  union {
+    int64_t v_int64;
+    double  v_float64;
+    void*   v_handle;
+    char    v_bytes[8];
+    // UCS4 string and Unicode
+    char32_t v_char32[2];
+  };
+};
+
+// Managed reference of Any value
+//Copy will trigger ref counting if
+// underlying value is an object.
+struct AnyValue : public AnyPodBaseValue_ {
+};
+
+// "View" value to any value. Copy will not
+// trigger reference counting.
+struct AnyView: public AnyPodBaseValue_ {
+};
+
+// An any value with extra padding data
+// can be used to store larger small str
+template<int num_paddings> 
+struct AnyPad : public AnyValue {
+  union {
+    char v_pad_bytes[num_paddings * 8];
+    // used to support UCS4 string and unicode.
+    char32_t v_pad_char32[num_paddings * 2];
+  }
+};
+```
+
+This is a design that outlines the key terms 
+
+- T0: Object: the intrusive ptr managed object, used by most containers
+    - This is the same as the current object, we list here for clarity.
+- T1: AnyValue(aka TVMRetValue): that can stores both pod value and managed 
reference to ptr
+    - By managed reference we mean that copy/destruction of AnyValue will 
trigger ref counter change if the stored value is an Object
+- T2: AnyView(aka TVMArgValue): that stores pod value and un-managed ptr.
+- T3: AnyPad: an any value that have larger padded size to accomodate on stack 
values.
+    - When the initial value defaults to null. Both AnyValue and AnyPad, can 
choose to fill the small_len to be the size of total bytes available. This can 
help us to be able to pass small string back in C API (without template), by 
looking at `AnyValue*` ’s small_len field to decide the maximum bytes allowed.
+
+**Discussions**  The default size of AnyValue is 16 bytes. This means that for 
small string, we can use extra 8 bytes to store the string part(7 bytes if we 
need a tail `\0`). If we go with UCS4, we can store two extra UCS4 Char without 
the tail `\0`. The extra space may not be sufficient for some of the small 
string needs (as a reference matxscript adopts extra padding of 8 bytes to 
accommodate small string unicode). AnyPad serves as another variation of 
AnyValue that contains extra stack space. AnyPad is intended to be used 
interchangeably in any places that AnyValue appears. See also followup sections 
on conversions function signatures on how that works. One interesting future 
direction point here is that future compilers can choose to try different 
AnyPad in code generation and autotune the padding default to the scenario that 
best fits the application.

Review Comment:
   I removed that per suggestion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm-rfcs] tqchen commented on a diff in pull request #97: [RFC] Further Unify Packed and Object in TVM Runtime

Reply via email to