morningman commented on issue #6746:
URL: 
https://github.com/apache/incubator-doris/issues/6746#issuecomment-982366698


   ## Execution of Table Function Node
   
   Table Function Node (TFN) contains one or more Table Functions, and its main 
logic is to expand the data received from the child nodes into multiple rows 
through the Table Function and return the data to the upper layer. The main 
execution process is as follows:
   
   1. Get a row of data from the child node child row.
   2. Pass the child row into each table function, and each table function will 
calculate and get a result set: S1, S2,...
   3. Do the Cartesian product of child row and each result set and send it to 
the upper layer.
   
   for example. Suppose the child row has 3 columns, k1, v1, v2:
   
   | k1 | v1 | v2 |
   |---|---|---|
   | 1 | "a,b,c" | "4,5,6" |
   
   Two Table Functions: `explod_split(v1,',')` and `explode_split(v2,',')` 
respectively produce the following result sets:
   
   | `explod_split(v1,',')` |
   |---|
   | "a" |
   | "b" |
   | "c" |
   
   | `explode_split(v2,',')` |
   |---|
   | "4" |
   | "5" |
   | "6" |
   
   The final Cartesian product result is:
   
   | k1 | `explod_split(v1,',')` | `explode_split(v2,',')` |
   |---|---|---|
   | 1 | "a" | "4" |
   | 1 | "a" | "5" |
   | 1 | "a" | "6" |
   | 1 | "b" | "4" |
   | 1 | "b" | "5" |
   | 1 | "b" | "6" |
   | 1 | "c" | "4" |
   | 1 | "c" | "5" |
   | 1 | "c" | "6" |
   
   ### Table Function Interface Design
   
   Because Doris does not currently support complex data types (such as Array), 
and Table Function is essentially an expression that returns an array type. So 
in this implementation, we will do special treatment for Table Function.
   
   1. DummyTableFunctions
   
       This is a deception class. Its main purpose is to generate the scalar 
function signature of the table function on the BE side to facilitate query 
planning on the FE side, and to use the existing scalar function framework when 
the BE performs parameter expression calculations. In other words, in the 
planning and execution preparation stages of the entire query, Table Function 
is treated as a scalar function.
   
   2. TableFunctionFactory
   
       The factory class of Table Function returns real Table Function 
instances based on the function name. Currently only supports matching Function 
by function name.
       
   3. TableFunction
   
       The actual Table Function implementation class. Provide the following 
interfaces:
       
       1. prepare()/open()
       
           Some preparations, such as calculation of constant expressions, 
memory allocation for intermediate result sets, and so on.
       
       3. process(row)
   
           According to the incoming data (row), calculate the Table Function 
result set.
       
       4. reset()
   
           Because of the Cartesian product relationship between multiple Table 
Functions, all the result sets of a Function may be traversed multiple times. 
This method will set the cursor of the result set to the initial position in 
order to continue the traversal.
       
       5. get_value()
   
           Get the value of the position pointed by the current cursor.
       
       6. forward()
   
           Move the cursor forward, then you can call get_value() to get the 
next value
       
       6. close()
   
           The cleanup work after Function execution.
           
       The subclasses of TableFunction are concrete implementations of each 
Table Function. The following three functions are implemented in this issue:
       
       1. `explode_split(str, delimiter)`
   
           Split str into multiple strings according to delimiter.
           
       2. `explode_json_array_xxx(json_str)`
   
           Split a json array. According to the type of elements in the json 
array, xxx can be string, int or double
           
       3. `explode_bitmap(bitmap)`
   
           Expand a bitmap and return the value of each element in the bitmap.
           
   ### Table Function Node Interface Design
   
    Table Function Node inherits from Exec Node. There are the following 
interfaces:
    
    1. init()
   
       Some initialization work, including obtaining Table Function objects, 
etc.
       
   2. prepare()/open()
   
       Some preparations. For example, prepare()/open() of the call expression
       
   3. get_next()
   
       Get a batch of results. Here, get_next() of the child node will be 
called to get the child node data first, then calculate the result of the Table 
Function, and return the data after the association.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to