NGA-TRAN commented on issue #4426:
URL: 
https://github.com/apache/arrow-datafusion/issues/4426#issuecomment-1332545456

   Thanks @alamb and @thinkharderdev for the feedback. Here is my proposal. I 
will update the main section above after we agree on this.
   
   We will use [Postgres's prepared statement 
syntax](https://www.postgresql.org/docs/current/sql-prepare.html#:~:text=A%20prepared%20statement%20is%20a,statement%20is%20planned%20and%20executed)
   
   Let us create a table for our examples
   
   ```SQL
   CREATE TABLE cpu ("user" INT, region VARCHAR, time TMESTAMP);
   
   -- insert some data
   
   SELECT * FROM cpu;
    user | region |        time         
   ------+--------+---------------------
       1 | east   | 2022-11-15 00:00:00
       2 | west   | 2022-11-30 19:10:25
   (2 rows)
   ```
   
   # New Feature: PREPARE a select statement and EXECUTE it
   
   We will support PREPARE and EXECUTE statement that are compatible with 
Postgres as follows:
   
   ### Example 1:
   
   ```SQL
   -- Create a prepared plan
   PREPARE my_plan_1(TIMESTAMP) AS 
      SELECT "user", region FROM cpu WHERE time = $1;
   
   -- Execute the plan
   EXECUTE my_plan_1('2022-11-30 19:10:25');
    user | region 
   ------+--------
       2 | west
   (1 row)
   ```
   
   ### Example 2
   
   ```SQL
   -- Create a prepared plan
   PREPARE my_plan_2(INT, INT) AS 
     SELECT region, count(*) FROM cpu GROUP BY region HAVING count(*) >= $1 and 
count(*) < $2 order by 1;
   
   -- Execute the plan
   EXECUTE my_plan_2(1, 10);
    region | count 
   --------+-------
    east   |     1
    west   |     1
   (2 rows)
   ```
   
   # Design proposal
   
   ## Prepare Statement
   
   sqlparser parses and provides output for 
PREPARE](https://github.com/sqlparser-rs/sqlparser-rs/blob/fa6bd01b19fabc9087d3b306e7c92f12cb8d368d/src/ast/mod.rs#L1444)
 and 
[EXECUTE](https://github.com/sqlparser-rs/sqlparser-rs/blob/fa6bd01b19fabc9087d3b306e7c92f12cb8d368d/src/ast/mod.rs#L1440)
 as follows
   
   ```rust
      Prepare {
           name: Ident,
           data_types: Vec<DataType>,
           statement: Box<Statement>,
       },
   
     Execute { name: Ident, parameters: Vec<Expr> },
   ```
   
   We will generate the corresponding struct when we generate the statement 
logical plan
   
   ```rust
   PreparedPlan {
      name: Indent,                            // name provided with the PREPARE
      params: Vec<DataType>,         // list of data types provided with the 
PREPARE
      plan:   LogicalPlan,                    // logical plan that understands 
Params defined below
   }
   
   struct Param {
      type: DataType,
      idx:   usize             // index to the vector `params`
   }
   ```
   
   Before showing examples of PreparedPlan, let us look into today logical 
plans.
   
   ```SQL
   -- Example 1:
   SELECT “user”, region FROM cpu WHERE time > 
to_timestamp('1970-01-01T00:00:00.000000120+00:00');
   
   Logical Plan:
    Projection: cpu.user, cpu.region
     Filter: cpu.time > totimestamp(Utf8("1970-01-01T00:00:00.000000120+00:00"))
       TableScan: cpu
   
   -- Example 2:
   SELECT region, count(*) FROM cpu GROUP BY region HAVING count(*) >= 1 and 
count(*) < 10 order by 1;
   
   logical plan:
    Sort: cpu.region ASC NULLS LAST
     Projection: cpu.region, COUNT(UInt8(1))
       Filter: COUNT(UInt8(1)) >= Int64(1) AND COUNT(UInt8(1)) < Int64(10)
         Aggregate: groupBy=[[cpu.region]], aggr=[[COUNT(UInt8(1))]]
           TableScan: cpu
   ```
   
   Let see the full output of a prepared statement:
   
   ### Example 1:
   ```SQL
   PREPARE my_plan_1(TIMESTAMP) AS 
      SELECT "user", region FROM cpu WHERE time = $1;
   ```
   
   The output will look like
   
   ```rust
   my_plan_1 = PreparedPlan {
       name: "my_plan_1",
       params: [Timestamp]     // vector of 1
       plan: Logical Plan
                     Projection: cpu.user, cpu.region
                        Filter: cpu.time > Param { type: Timestamp, idx: 0}   
// idx points to the first item of the params
                           TableScan: cpu
   }
   ```
   
   ### Example 2:
   
   ```SQL
   PREPARE my_plan_2(INT, INT) AS 
     SELECT region, count(*) FROM cpu GROUP BY region HAVING count(*) >= $1 and 
count(*) < $2 order by 1;
   ```
   
   The output will look like
   
   ```rust
   my_plan_2 = PreparedPlan {
      name: "my_plan_2",
      params: [Int64, Int64],
      plan: Logical plan 
                  Sort: cpu.region ASC NULLS LAST
                     Projection: cpu.region, COUNT(UInt8(1))
                        Filter: COUNT(UInt8(1)) >= Param { type: Int64, idx: 0 
}    // points to first item of the params
                                   AND COUNT(UInt8(1)) < Param { type: Int64, 
idx: 1 }  // points to second items of the params
                            Aggregate: groupBy=[[cpu.region]], 
aggr=[[COUNT(UInt8(1))]]
                              TableScan: cpu
   }
   ```
   
   **Question: How do we store all the generated prepared plans? In a hashMap?**
   
   ## Execute Plan
   
   sqlparser parses and provides output for  
[EXECUTE](https://github.com/sqlparser-rs/sqlparser-rs/blob/fa6bd01b19fabc9087d3b306e7c92f12cb8d368d/src/ast/mod.rs#L1440)
 as follows
   
   ```rust
     Execute { name: Ident, parameters: Vec<Expr> },
   ```
   We will search the the plan and replace Params in the Logical plan with the 
provided parameters. Then generate physical plan for it and execute.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to