rewerma commented on code in PR #4012: URL: https://github.com/apache/incubator-seatunnel/pull/4012#discussion_r1095535497
########## docs/en/transform-v2/deduplicate.md: ########## @@ -0,0 +1,84 @@ +# Deduplicate + +> Deduplicate transform plugin + +## Description + +Deduplicate rows by specified fields. + +## Options + +| name | type | required | default value | +|----------------------| ----- | -------- |---------------| +| duplicate_fields | array | yes | | + +### duplicate_fields [array] + +Duplicate rows by the field of array + +### common options [string] + +Transform plugin common parameters, please refer to [Transform Plugin](common-options.md) for details + +## Example + +The data read from source is a table like this: + +| id | name | age | +|----|----------|-----| +| 1 | Joy Ding | 20 | +| 5 | Joy Ding | 20 | +| 2 | Kin Dom | 14 | +| 9 | Kin Dom | 14 | + +The source table data must sort by duplicate fields. For example, JDBC source: Review Comment: It is unsupported for the partitioned data yet, unless the data partitioned by the duplicate fields first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
