xinyiZzz opened a new pull request, #30817:
URL: https://github.com/apache/doris/pull/30817

   ## Proposed changes
   
   # 1. Motivation
   PL-SQL (Stored procedure) is a collection of sql, which is defined and used 
similarly to functions. It supports conditional judgments, loops and other 
control statements, supports cursor processing of result sets, and can write 
business logic in SQL.
   
   Hive uses Hplsql to support PL-SQL and is largely compatible with Oracle, 
Impala, MySQL, Redshift, PostgreSQL, DB2, etc. We support PL-SQL in Doris based 
on Hplsql to achieve compatibility with Stored procedures of database systems 
such as Oracle and PostgreSQL.
   
   Reference documentation:
   http://mail.hplsql.org
   
https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715
   
   # 2. Implementation
   Take the following case as an example to explain the execution process of 
the stored procedure. It supports two execution methods: connecting to FE using 
the Mysql protocol and using scripts.
   ```
   CREATE OR REPLACE PROCEDURE A(IN name STRING, OUT result int)
             select count(*) from test;
             select count(*) into result from test where k = name;
   END
   
   declare result INT default = 0;
   call A(‘xxx’, result);
   print result;
   ```
   
   ## 2.1 Use Mysql protocol connect FE to execute
   
![image](https://github.com/apache/doris/assets/13197424/0b78e039-0350-4ef1-bef3-0ebbf90274cd)
   
   1. Add procedure and persist the Procedure Name and Source (raw SQL) into 
Doris FE metadata.
   2. Call procedure, extract the actual parameter Value and Procedure Name in 
Call Stmt. Use Procedure Name to find the Source in the metadata, extract the 
Name and Type of the Procedure parameter, and match them with the actual 
parameter Value to form a complete variable <Name, Type, Value>.
   3. Execute Doris Statement
        - Use Doris Logical Plan Builder to parse the Doris Statement syntax in 
Source, replace parameter variables, remove the into variable clause, and 
generate a Plan Tree that conforms to Doris syntax.
        - Use stmtExecutor to execute SQL and encapsulate the query result set 
iterator into QueryResult.
        - Output the query results to Mysql Channel, or write them into Cursor, 
parameters, and variables.
        - Stored Programs compatible with Mysql protocol support multiple 
statements.
   4. Execute PL-SQL Statement
        - Use Plsql Logical Plan Builder to parse and execute PL-SQL Statement 
syntax in Source, including Loop, Cursor, IF, Declare, etc., and basically 
reuse HplSQL.
   
   ## 2.2 Script execution
   Start reading and parsing SQL from the script, connect Doris FE to send 
Doris Statement through JDBC, process the returned Doris query result set 
locally, and return the final result.
   
   Instructions:
   1. Modify the jdbc path and username and password to connect to doris in the 
fe/conf/plsql-site.xml configuration file:
   ```
   <property>
      <name>plsql.conn.dorisconn</name>
      
<value>com.mysql.cj.jdbc.Driver;jdbc:mysql://fehost:queryport/database;user;password</value>
      <description>Doris jdbc connection</description>
   </property>
   ```
   2. Execute stored procedure sql through script:
   ```
   sh fe/bin/plsql.sh -f stored_procedure.sql
   ```
   
   Implementation process:
   
![image](https://github.com/apache/doris/assets/13197424/f23dc99f-d79a-45b5-9c6d-aae5801a0c93)
   
   It is similar to using the Mysql protocol to connect the FE execution 
process. The difference is that JDBC is used to execute the Doris Statement, 
and the created Stored procedure will not be persisted to the Doris FE 
metadata, but will be temporarily saved locally.
   
   # 3. TODO
   1. Doris Parser supports declare variable.
   2. Select Statement supports insert into variable.
   3. Parameters and fields have the same name.
   4. If Cursor exits halfway, will there be a memory leak?
   5. Use getOriginSql(ctx) in syntax parsing LogicalPlanBuilder to obtain the 
original SQL. Is there any problem with special characters?
   6. Supports complex types such as Map and Struct.
   7. Test syntax such as Package.
   8. Support UDF.
   9. Is it compatible with Oracle PL-SQL?
   
   # 4. Some questions
   1. JDBC does not support the execution of stored procedures that return 
results. You can only Into the execution results into a variable or write them 
into a table, because when multiple result sets are returned, JDBC needs to use 
the prepareCall statement to execute, otherwise the Statemnt of the returned 
result executes Finalize. Send EOF Packet will report an error;
   2. Use PL-SQL Cursor to open multiple Query result set iterators at the same 
time. Doris BE will cache the intermediate status of these Queries (such as 
HashTable) and query results until the Query result set iteration is completed. 
If the Cursor is not available for a long time Being used will result in a lot 
of memory waste.
   3. The current PL-SQL Statement is calculated in Doris FE, such as IF, Loop, 
Cursor, etc. This may affect other services on Doris FE and is limited by the 
performance of Doris FE. The creation of stored procedures will Forward Master 
FE, and execution can In the Observer, in the short term, consider expanding 
the number of FE Observer nodes to execute PL-SQL Statement. A better solution 
is to move it to Doris BE for execution.
   4. The format of the result returned by Doris Statement is ```xxxx\n, 
xxxx\n, 2 rows affected (0.03 sec)```. PL-SQL uses Print to print variable 
values in an unformatted format, and JDBC cannot easily obtain them. Real 
results.
   
   # 5. Some thoughts
   The above execution of Doris Statement reuses Doris Logical Plan Builder for 
syntax parsing, parses it from top to bottom into a Plan Tree, and calls 
stmtExecutor for execution. PL-SQL replacement variables, removal of Into 
Variable and other operations are coupled in Doris syntax parsing. The 
advantage is that it is easier to It can be compatible with Doris grammar with 
a few changes, but the disadvantage is that it will invade the Doris grammar 
parsing process.
   HplSQL performs a syntax parsing independently of Hive to implement variable 
substitution and other operations, and finally outputs a SQL that conforms to 
Hive syntax. The following is a simple syntax parsing process for select, 
where, expression, table name, join, The parsing of agg, order and other 
grammars must be re-implemented. The advantage is that it is completely 
independent from the original system, but the changes are too complicated.
   
![image](https://github.com/apache/doris/assets/13197424/7539e485-0161-44de-9100-1a01ebe6cc07)
   
   ***
   
   # 1. 动机
   PL-SQL(Stored 
procedure)是一组sql的集合,定义及使用方式类似于函数。支持条件判断、循环等控制语句,支持游标处理结果集,可用sql的方式编写业务逻辑。
   
   Hive 使用 Hplsql 支持PL-SQL,同时很大程度兼容 Oracle, Impala, MySQL, Redshift, 
PostgreSQL, DB2 等,我们基于 Hplsql 在 Doris 支持PL-SQL,实现对 Oracle、PostgreSQL 
等数据库系统Stored procedure的兼容。
   
   参考文档:
   http://mail.hplsql.org  
   
https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715
   
   # 2. 实现
   以下面的 case 为例,解释存储过程的执行流程,支持使用 Mysql 协议连接 FE 和使用脚本两种方式执行。
   ```
   CREATE OR REPLACE PROCEDURE  A(IN name STRING, OUT result int)
            select count(*) from test;
            select count(*) into result from test where k = name;
   END
   
   declare result INT default = 0;
   call A(‘xxx’, result);
   print result;
   ```
   
   ## 2.1 使用 Mysql 协议连接 FE 执行 
   [图片]
   1. Add procedure,将 Procedure Name 和 Source(原始SQL)持久化到 Doris FE 元数据中。
   2. Call procedure,提取 Call Stmt 中的实参 Value和 Procedure Name。使用 Procedure Name 
在元数据中找到 Source,提取 Procedure 形参的 Name 和 Type,和实参 Value 匹配后组成完整变量<Name, Type, 
Value>。
   3. 执行 Doris Statement
       - 使用 Doris Logical Plan Builder 对 Source 中的 Doris Statement 
语法解析,替换参数变量,去除 into variable子句,生成符合 Doris 语法的Plan Tree。
       - 使用 stmtExecutor 执行SQL,将查询结果集迭代器封装到 QueryResult 中。
       - 将查询结果输出到Mysql Channel,或写入Cursor、出参、变量中。
       - 兼容 Mysql 协议的 Stored Programs 支持多语句。
   4. 执行 PL-SQL Statement
       - 使用 Plsql Logical Plan Builder 对 Source 中的 PL-SQL Statement 语法解析并执行,包括 
Loop、Cursor、IF、Declare 等,基本复用 HplSQL。
   
   ## 2.2 脚本执行 
   从脚本启动读取并解析SQL,通过 JDBC 连接 Doris FE 发送 Doris Statement,在本地处理返回的 Doris 
查询结果集,并返回最终结果。
   
   使用方法:
   1. Modify the jdbc path and username and password to connect to doris in the 
fe/conf/plsql-site.xml configuration file:
   ```
   <property>
     <name>plsql.conn.dorisconn</name>
     
<value>com.mysql.cj.jdbc.Driver;jdbc:mysql://fehost:queryport/database;user;password</value>
     <description>Doris jdbc connection</description>
   </property>
   ```
   2. Execute stored procedure sql through script:
   ```
   sh fe/bin/plsql.sh -f stored_procedure.sql
   ```
   
   执行流程:
   [图片]
   与使用 Mysql 协议连接 FE 执行流程类似,区别是使用 JDBC 执行 Doris Statement,并且创建的 Stored 
procedure 不会持久化到 Doris FE 元数据中,只是临时保存在本地。
   
   # 3. TODO
   1. Doris Parser support declare variable。
   2. Select Statement support insert into variable。
   3. 参数和字段重名。
   4. 如果Cursor中途退出,是否会内存泄漏。
   5. 在语法解析 LogicalPlanBuilder 中使用 getOriginSql(ctx) 获得原始SQL,特殊字符是否有问题。
   6. 支持 Map、Struct 等复杂类型。
   7. 测试 Package 等语法。
   8. 支持 UDF。
   9. 是否兼容 Oracle PL-SQL,
   
   # 4. 一些问题
   1. 不支持 JDBC 执行会返回结果的存储过程,只能把执行结果 Into 到一个变量,或写入表中,因为返回多个结果集时 JDBC 需使用 
prepareCall 语句执行,否则返回结果的 Statemnt 执行 Finalize 时 Send EOF Packet 会报错;
   2. 使用 PL-SQL Cursor 可以同时打开多个 Query 的结果集迭代器,在 Doris BE 会一直缓存这些 Query 的中间状态(比如 
HashTable)和查询结果,直到 Query 结果集迭代完成,如果 Cursor 长时间没有被使用,会导致大量内存浪费。
   3. 当前 PL-SQL Statement 在 Doris FE 计算,比如 IF、Loop、Cursor 等,这可能影响 Doris FE 
上的其他服务,并受限于 Doris FE 的性能,存储过程的创建会 Forward Master FE,执行可以在 Observer,短期考虑扩展 FE 
Observer 节点个数执行 PL-SQL Statement,更好的方案是移到 Doris BE 上执行。
   4. Doris Statement返回结果的格式是 ```xxxx\n, xxxx\n, 2 rows affected (0.03 
sec)```,PL-SQL 使用 Print 打印变量值是无格式的,JDBC 无法方便的拿到真正结果。  
   
   # 5. 一些思考
   上面执行 Doris Statement 复用 Doris Logical Plan Builder 做语法解析,自顶向下解析成一个 Plan 
Tree,并调用 stmtExecutor 执行,PL-SQL 替换变量、去除 Into Variable 等操作耦合在 Doris 
语法解析中,好处是较少的改动就可以兼容 Doris 语法,坏处是会侵入 Doris 语法解析流程。
   HplSQL是独立于 Hive 单独做了一次语法解析,实现变量替换等操作,最终输出一个符合 Hive 语法的 SQL,下面是一个简单的语法解析流程,对 
SQL 中 select、 where、expression、table name、join、agg、order 
等语法的解析都要重新实现,好处是与原系统完全独立,但改动太复杂。
   [图片]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to