zpi1223 opened a new pull request #3644:
URL: https://github.com/apache/iceberg/pull/3644
Changelog category (leave one):
- Documentation (changelog entry is not required)
Changelog entry (a user-readable short description of the changes that goes
to CHANGELOG.md):
For situations where the description of MERGE INTO ... USING (SELECT ...) is
not clear enough, add relevant warning information.
Detailed description / Documentation draft:
1.For the regular select syntax, it will be misunderstood as the from
operation is followed, so add related warning instructions.
2.Examples of errors in the application are as follows:
- create table
```
CREATE TABLE IF NOT EXISTS hadoop_prod.schema.table16 (
id string,
data string,
jydw_no string,
ts timestamp
) USING iceberg
PARTITIONED BY (bucket(12, id))
```
- merge into syntax(index <- 1 to 6)
```
CREATE TABLE IF NOT EXISTS hadoop_prod.schema.table16 (
MERGE INTO hadoop_prod.schema.table16 t
USING (select 6 as id, 'testdata6' as data, 'B36' as jydw_no) s
ON t.id = s.id and t.jydw_no = s.jydw_no
WHEN MATCHED THEN UPDATE SET t.id = s.id, t.data = s.data, t.jydw_no =
s.jydw_no
WHEN NOT MATCHED THEN INSERT (id, data, jydw_no, ts) VALUES ($index,
'testdata, 'B3$index', CURRENT_TIMESTAMP)
```
- Use Merge into wrong syntax to insert 6 pieces of data,the following error
will be reported:
```
MERGE INTO hadoop_prod.schema.table16 t
USING (select * from hadoop_prod.schema.table16) s
ON t.id = s.id and t.jydw_no = s.jydw_no
WHEN MATCHED THEN UPDATE SET * || t.id = s.id, t.data = s.data, t.jydw_no =
s.jydw_no
WHEN NOT MATCHED THEN INSERT (id, data, jydw_no, ts) VALUES ($index,
'testdata', 'B3$index', CURRENT_TIMESTAMP)
```
```
Exception in thread "main" org.apache.spark.SparkException: The ON search
condition of the MERGE statement matched a single row from the target table
with multiple rows of the source table. This could result in the target row
being operated on more than once with an update or delete operation and is not
allowed.
```
- Use Merge into the correct syntax to insert 6 pieces of data(Correct
execution for the first time), the initial results are as follows:
```
+---+----------+---------+---------------------------+
|id |data |jydw_no |ts |
+---+----------+---------+---------------------------+
|1 |testdata |B31 |2021-12-02 11:29:00.275 |
|2 |testdata |B32 |2021-12-02 11:29:08.746 |
|3 |testdata |B33 |2021-12-02 11:29:11.072 |
|4 |testdata |B34 |2021-12-02 11:29:12.721 |
|5 |testdata |B35 |2021-12-02 11:29:14.056 |
|6 |testdata |B36 |2021-12-02 11:29:15.319 |
+---+----------+---------+---------------------------+
```
- Use Merge into the correct syntax to insert 6 pieces of data(Correct
execution the second time), the initial results are as follows:
```
+---+------------------------+---------+---------------------------+
|id |data |jydw_no |ts
|
+---+------------------------+---------+---------------------------+
|1 |testdata |B31 |2021-12-02
11:29:00.275 |
|2 |testdata |B32 |2021-12-02
11:29:08.746 |
|3 |testdata |B33 |2021-12-02
11:29:11.072 |
|4 |testdata B34 |2021-12-02 11:29:12.721
|
|5 |testdata |B35 |2021-12-02
11:29:14.056 |
|6 |testdata6 |B36 |2021-12-02 11:29:15.319
|
+---+------------------------+----------+---------------------------+
```
- Use Merge into the correct syntax to insert 6 pieces of data(Second wrong
execution), the initial results are as follows:
```
MERGE INTO hadoop_prod.schema.table16 t
USING (select 6 as id, 'B36' as jydw_no, 'testdata7' as data) s
ON t.id = s.id and t.jydw_no = s.jydw_no
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT (id, data, jydw_no, ts) VALUES ($index,
'testdata', 'B3$index', CURRENT_TIMESTAMP)
```
```
+---+----------+-------------------+---------------------------+
|id |data |jydw_no |ts
|
+---+----------+-------------------+---------------------------+
|1 |testdata |B31 |2021-12-02 11:29:00.275 |
|2 |testdata |B32 |2021-12-02 11:29:08.746 |
|3 |testdata |B33 |2021-12-02 11:29:11.072 |
|4 |testdata |B34 |2021-12-02 11:29:12.721 |
|5 |testdata |B35 |2021-12-02 11:29:14.056 |
|6 |B36 |testdata7 |2021-12-02 11:29:15.319 |
+---+----------+-------------------+---------------------------+
```
3.Therefore, I think it is necessary to add this waring information to
clearly indicate how to use it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]