zpi1223 opened a new pull request #3644:
URL: https://github.com/apache/iceberg/pull/3644


   Changelog category (leave one):
   - Documentation (changelog entry is not required)
   
   
   Changelog entry (a user-readable short description of the changes that goes 
to CHANGELOG.md):
   For situations where the description of MERGE INTO ... USING (SELECT ...) is 
not clear enough, add relevant warning information.
   
   
   Detailed description / Documentation draft:
   1.For the regular select syntax, it will be misunderstood as the from 
operation is followed, so add related warning instructions.
   2.Examples of errors in the application are as follows:
   
   - create table 
   ```
   CREATE TABLE IF NOT EXISTS hadoop_prod.schema.table16 (
   id string,
   data string,
   jydw_no string,
   ts timestamp
   ) USING iceberg
   PARTITIONED BY (bucket(12, id))
   ```
   - merge into syntax(index <- 1 to 6)
   ```
   CREATE TABLE IF NOT EXISTS hadoop_prod.schema.table16 (
   MERGE INTO hadoop_prod.schema.table16 t
   USING (select 6 as id, 'testdata6' as data, 'B36' as jydw_no) s 
   ON t.id = s.id and t.jydw_no = s.jydw_no
   WHEN MATCHED THEN UPDATE SET t.id = s.id, t.data = s.data, t.jydw_no = 
s.jydw_no
   WHEN NOT MATCHED THEN INSERT (id, data, jydw_no, ts) VALUES ($index, 
'testdata, 'B3$index', CURRENT_TIMESTAMP)
   ```
   - Use Merge into wrong syntax to insert 6 pieces of data,the following error 
will be reported:
   ```
   MERGE INTO hadoop_prod.schema.table16 t
   USING (select * from hadoop_prod.schema.table16) s
   ON t.id = s.id and t.jydw_no = s.jydw_no
   WHEN MATCHED THEN UPDATE SET * || t.id = s.id, t.data = s.data, t.jydw_no = 
s.jydw_no
   WHEN NOT MATCHED THEN INSERT (id, data, jydw_no, ts) VALUES ($index, 
'testdata', 'B3$index', CURRENT_TIMESTAMP)
   ```
   ```
   Exception in thread "main" org.apache.spark.SparkException: The ON search 
condition of the MERGE statement matched a single row from the target table 
with multiple rows of the source table. This could result in the target row 
being operated on more than once with an update or delete operation and is not 
allowed.
   ```
   - Use Merge into the correct syntax to insert 6 pieces of data(Correct 
execution for the first time), the initial results are as follows:
   ```
   +---+----------+---------+---------------------------+
    |id   |data         |jydw_no  |ts                                      |
   +---+----------+---------+---------------------------+
   |1     |testdata   |B31        |2021-12-02 11:29:00.275 |
   |2     |testdata   |B32        |2021-12-02 11:29:08.746 |
   |3     |testdata   |B33        |2021-12-02 11:29:11.072 |
   |4     |testdata   |B34        |2021-12-02 11:29:12.721 |
   |5     |testdata   |B35        |2021-12-02 11:29:14.056 |
   |6     |testdata   |B36        |2021-12-02 11:29:15.319 |
   +---+----------+---------+---------------------------+
   ```
   - Use Merge into the correct syntax to insert 6 pieces of data(Correct 
execution the second time), the initial results are as follows:
   ```
   +---+------------------------+---------+---------------------------+
   |id   |data                              |jydw_no  |ts                       
               |
   +---+------------------------+---------+---------------------------+
   |1     |testdata                       |B31          |2021-12-02 
11:29:00.275 |
   |2     |testdata                       |B32          |2021-12-02 
11:29:08.746 |
   |3     |testdata                       |B33          |2021-12-02 
11:29:11.072 |
   |4     |testdata                       B34          |2021-12-02 11:29:12.721 
|
   |5     |testdata                       |B35          |2021-12-02 
11:29:14.056 |
   |6     |testdata6                     |B36          |2021-12-02 11:29:15.319 
|
   +---+------------------------+----------+---------------------------+
   ```
   - Use Merge into the correct syntax to insert 6 pieces of data(Second wrong 
execution), the initial results are as follows:
   ```
   MERGE INTO hadoop_prod.schema.table16 t
   USING (select 6 as id, 'B36' as jydw_no, 'testdata7' as data) s
   ON t.id = s.id and t.jydw_no = s.jydw_no
   WHEN MATCHED THEN UPDATE SET *
   WHEN NOT MATCHED THEN INSERT (id, data, jydw_no, ts) VALUES ($index, 
'testdata', 'B3$index', CURRENT_TIMESTAMP)
   ```
   ```
   +---+----------+-------------------+---------------------------+
   |id    |data         |jydw_no                |ts                             
         |
   +---+----------+-------------------+---------------------------+
   |1     |testdata   |B31                        |2021-12-02 11:29:00.275 |
   |2     |testdata   |B32                        |2021-12-02 11:29:08.746 |
   |3     |testdata   |B33                        |2021-12-02 11:29:11.072 |
   |4     |testdata   |B34                        |2021-12-02 11:29:12.721 |
   |5     |testdata   |B35                        |2021-12-02 11:29:14.056 |
   |6     |B36          |testdata7              |2021-12-02 11:29:15.319 |
   +---+----------+-------------------+---------------------------+
   ```
   3.Therefore, I think it is necessary to add this waring information to 
clearly indicate how to use it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to