iamhucong opened a new issue, #31466:
URL: https://github.com/apache/shardingsphere/issues/31466

   ## Background
   
   Hi community.\
   The ShardingSphere SQL parser engine helps users to parse SQL to create the 
AST (Abstract Syntax Tree) and visit the AST to get SQLStatement (Java Object). 
Currently, we are planning to enhance the support for Doris SQL parsing in 
ShardingSphere.
   
   More details:
   
<https://shardingsphere.apache.org/document/current/en/reference/sharding/parse/>
  
   
   
   ## Issue Background Explanation
   
   The current issue involves using a custom web scraping script to fetch SQL 
cases from the official website of a corresponding database. These SQL cases 
are then passed to ShardingSphere's parsing engine for analysis. For SQL cases 
that fail to be parsed successfully, every 3 to 5 SQL cases are grouped 
together as an issue.
   1. Since  SQL cases are obtained through web scraping, it cannot be 
guaranteed that all SQL cases are correct. Please follow the following process 
to handle this pull request (PR).
   2. Some SQL cases may have already been fixed in other PRs. For cases that 
can already be executed successfully, simply leave a comment to ignore them.
   3. If a SQL case can be executed successfully without any code changes, 
there is no need to add a corresponding test assert file.  
   
   
   ## Task
   
   This issue is to support more Doris sql parsing, as follows:
   
   ```
   REFRESH CATALOG hive
   ```
   
   - [ ] 
[link](https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-statements/Utility-Statements/REFRESH/)
   
   ```
   REFRESH DATABASE ctl.database1
   ```
   
   - [ ] 
[link](https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-statements/Utility-Statements/REFRESH/)
   
   ```
   REFRESH DATABASE database1
   ```
   
   - [ ] 
[link](https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-statements/Utility-Statements/REFRESH/)
   
   ```
   REFRESH TABLE ctl.db.table1
   ```
   
   - [ ] 
[link](https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-statements/Utility-Statements/REFRESH/)
   
   ```
   REFRESH TABLE db.table1
   ```
   
   - [ ] 
[link](https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-statements/Utility-Statements/REFRESH/)
   
   ```
   REFRESH TABLE table1
   ```
   
   - [ ] 
[link](https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-statements/Utility-Statements/REFRESH/)
   
   ```
   BIGINT size(ARRAY<T> arr) BIGINT array_size(ARRAY<T> arr) BIGINT 
cardinality(ARRAY<T> arr)
   ```
   
   - [ ] 
[link](https://doris.apache.org/zh-CN/docs/2.0/sql-manual/sql-functions/array-functions/array-size/)
   
   ```
   Syntax:     curl --location-trusted -u user:passwd -XPOST 
http://host:port/api/{db}/_multi_start?label=xxx     curl --location-trusted -u 
user:passwd -T data.file 
http://host:port/api/{db}/{table1}/_load?label=xxx\&sub_label=yyy     curl 
--location-trusted -u user:passwd -T data.file 
http://host:port/api/{db}/{table2}/_load?label=xxx\&sub_label=zzz     curl 
--location-trusted -u user:passwd -XPOST 
http://host:port/api/{db}/_multi_commit?label=xxx     curl --location-trusted 
-u user:passwd -XPOST http://host:port/api/{db}/_multi_desc?label=xxx On the 
basis of 'MINI LOAD', 'MULTI LOAD' can support users to import to multiple 
tables at the same time. The specific commands are shown above. 
'/api/{db}/_multi_start' starts a multi-table import task 
'/api/{db}/{table}/_load' adds a table to be imported to an import task. The 
main difference from 'MINI LOAD' is that the 'sub_label' parameter needs to be 
passed in '/api/{db}/_multi_commit' submits the entire multi-table import task, 
and st
 arts processing in the background '/api/{db}/_multi_abort' Abort a multi-table 
import task '/api/{db}/_multi_desc' can display the number of jobs submitted by 
a multi-table import task Description of the HTTP protocol     Authorization 
Authentication Currently, Doris uses HTTP Basic authorization authentication. 
So you need to specify the username and password when importing                 
        This method is to pass the password in clear text, since we are 
currently in an intranet environment. . .     Expect Doris needs to send the 
http request, it needs to have 'Expect' header information, the content is 
'100-continue'                         why? Because we need to redirect the 
request, before transmitting the data content,                         This can 
avoid causing multiple transmissions of data, thereby improving efficiency.     
Content-Length Doris needs to send the request with the 'Content-Length' 
header. If the content sent is greater than                         If
  the 'Content-Length' is less, then Palo thinks that there is a problem with 
the transmission, and fails to submit the task.                         NOTE: 
If more data is sent than 'Content-Length', then Doris only reads 
'Content-Length'                         length content and import Parameter 
Description:     user: If the user is in the default_cluster, the user is the 
user_name. Otherwise user_name@cluster_name.     label: Used to specify the 
label number imported in this batch, which is used for later job status query, 
etc.                         This parameter is required.     sub_label: Used to 
specify the subversion number inside a multi-table import task. For loads 
imported from multiple tables, this parameter must be passed in.     columns: 
used to describe the corresponding column names in the import file.             
            If it is not passed in, then the order of the columns in the file 
is considered to be the same as the order in which the table was created.   
                       The specified method is comma-separated, for example: 
columns=k1,k2,k3,k4     column_separator: used to specify the separator between 
columns, the default is '\t'                         NOTE: url encoding is 
required, for example, '\t' needs to be specified as the delimiter,             
            Then you should pass in 'column_separator=%09'     
max_filter_ratio: used to specify the maximum ratio of non-standard data 
allowed to filter, the default is 0, no filtering is allowed                    
     The custom specification should be as follows: 'max_filter_ratio=0.2', 
which means 20% error rate is allowed                         Passing in has 
effect when '_multi_start' NOTE:     1. This import method currently completes 
the import work on one machine, so it is not suitable for import work with a 
large amount of data.     It is recommended that the amount of imported data 
should not exceed 1GB     2. Currently it is not possible to submit multiple 
files u
 sing `curl -T "{file1, file2}"`, because curl splits them into multiple files  
   The request is sent. Multiple requests cannot share a label number, so it 
cannot be used.     3. Supports the use of curl to import data into Doris in a 
way similar to streaming, but only after the streaming ends Doris     The real 
import behavior will occur, and the amount of data in this way cannot be too 
large.
   ```
   
   - [ ] 
[link](https://doris.apache.org/docs/dev/sql-manual/sql-statements/Data-Manipulation-Statements/Load/MULTI-LOAD/)
   
   ```
   MULTI, MINI, LOAD
   ```
   
   - [ ] 
[link](https://doris.apache.org/docs/dev/sql-manual/sql-statements/Data-Manipulation-Statements/Load/MULTI-LOAD/)
   
   ```
   CREATE TABLE test1.table2 LIKE test1.table1
   ```
   
   - [ ] 
[link](https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE-LIKE/)
   
   
   
   
   ## Overall Procedure
   
   If you intend to participate in fixing this issue, please feel free to leave 
a comment below the issue. Community members will assign the issue accordingly. 
 
   
   ><u>For example, you can leave a comment like this: "Hi, please assign this 
issue to me. Thank you!"</u>
   
   Once you have claimed the issue, please review the syntax of the SQL on the 
official website of the corresponding database. Execute the SQL on the 
respective database to ensure the correctness of the SQL syntax.
   
   ><u>You can check the corresponding source of each SQL case on the official 
database website by clicking on the link provided below each case.</u> 
   
   Next, execute the problematic SQL cases mentioned above in the database (you 
can quickly start the                     corresponding database using the 
Docker image for that database, and then connect to it using 
   a client you are  familiar with), to ensure that the SQL syntax itself is 
correct.  
   
   ## Fixing ANTLR Grammar Parsing Issue
   
   Once you have confirmed the correctness of the SQL syntax, you can validate 
and fix the grammar parsing issue in ShardingSphere. 
   
   >If you are using IntelliJ IDEA, you will need to install the ANTLR plugin 
before proceeding.
   
   If it is an ANTLR parsing error message, try to repair the `.g4` file by 
comparing it with the official database syntax until the SQL can be correctly 
parsed by ANTLR.
   
   When there is no error message in the ANTLR Preview window, it means that 
ANTLR can correctly parse the SQL.  
   
   ## Visitor problem fix
   
   After ANTLR parses SQL into an abstract syntax tree, ShardingSphere will 
access the abstract syntax tree through Visitor and extract the required 
information.
   If you need to extract Segments, you need to first execute:
    ```
    mvn -T 2C clean install -DskipTests
    ```  
    Under the shardingsphere-parser module to compile the entire parser module.\
   Then rewrite the corresponding visit method in SQLStatementVisitorr as 
needed to extract the corresponding Segment.
   
   ## Add assertion test file  
   
   After the above SQL parsing problem is repaired, the corresponding Test 
needs to be added. 
   The steps are as follows:
   1. Add the corresponding `sql-case` in the `sql/supported` directory.
   2. Add case assertions in the case directory of the 
`shardingsphere-test-it-parser` module.
   3.  Run 
`org.apache.shardingsphere.test.it.sql.parser.internal.InternalSQLParserIT`\
    After SQL Parser IT runs successfully, you can submit a PR.   
   
   ## Relevant Skills
   
   1. Master JAVA language
   2. Have a basic understanding of Antlr g4 file
   3. Be familiar with Doris SQLs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to