EricJoy2048 commented on code in PR #4012: URL: https://github.com/apache/incubator-seatunnel/pull/4012#discussion_r1099768605
########## docs/en/transform-v2/deduplicate.md: ########## @@ -0,0 +1,84 @@ +# Deduplicate + +> Deduplicate transform plugin + +## Description + +Deduplicate rows by specified fields. + +## Options + +| name | type | required | default value | +|----------------------| ----- | -------- |---------------| +| duplicate_fields | array | yes | | + +### duplicate_fields [array] + +Duplicate rows by the field of array + +### common options [string] + +Transform plugin common parameters, please refer to [Transform Plugin](common-options.md) for details + +## Example + +The data read from source is a table like this: + +| id | name | age | +|----|----------|-----| +| 1 | Joy Ding | 20 | +| 5 | Joy Ding | 20 | +| 2 | Kin Dom | 14 | +| 9 | Kin Dom | 14 | + +The source table data must sort by duplicate fields. For example, JDBC source: Review Comment: > When the data is partitioned, multiple transform objects will be created to read in parallel, and the data will be partitioned first and then sorted +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
